|
|
|
| Mobile: +91 9491392912 | | vamshi.krishna.prime@gmail.com |
|
Case Study Overview¶================================
In the telecommunication industry, customers tend to change operators if not provided with attractive schemes and offers. It is very important for any telecom operator to prevent the present customers from churning to other operators. As a data scientist, the task in this case study would be to build an ML model which can predict if the customer will churn or not in a particular month based on the past data.
Data Exploration¶===========================
Load relevant libraries at this section to ease the maintainance and tracking.
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import matplotlib.patches as patches
%matplotlib inline
import seaborn as sb
import traceback
import math
import datetime
Import data dictionary to understand the variables:
# import data dictionary
try:
info_df = pd.read_csv('data_dictionary.csv')
print('Data Load: Successful')
except Exception:
traceback.print_exc()
print('Data Load: Failure')
Data Load: Successful
Import Train dataset:
# import data dictionary
try:
train_df = pd.read_csv('train.csv', delimiter = ',')
print('Data Load: Successful')
except Exception:
traceback.print_exc()
print('Data Load: Failure')
Data Load: Successful
Visually inspect the `data dictionary`
info_df.style
| Acronyms | Description | |
|---|---|---|
| 0 | CIRCLE_ID | Telecom circle area to which the customer belongs to |
| 1 | LOC | Local calls within same telecom circle |
| 2 | STD | STD calls outside the calling circle |
| 3 | IC | Incoming calls |
| 4 | OG | Outgoing calls |
| 5 | T2T | Operator T to T ie within same operator mobile to mobile |
| 6 | T2M | Operator T to other operator mobile |
| 7 | T2O | Operator T to other operator fixed line |
| 8 | T2F | Operator T to fixed lines of T |
| 9 | T2C | Operator T to its own call center |
| 10 | ARPU | Average revenue per user |
| 11 | MOU | Minutes of usage voice calls |
| 12 | AON | Age on network number of days the customer is using the operator T network |
| 13 | ONNET | All kind of calls within the same operator network |
| 14 | OFFNET | All kind of calls outside the operator T network |
| 15 | ROAM | Indicates that customer is in roaming zone during the call |
| 16 | SPL | Special calls |
| 17 | ISD | ISD calls |
| 18 | RECH | Recharge |
| 19 | NUM | Number |
| 20 | AMT | Amount in local currency |
| 21 | MAX | Maximum |
| 22 | DATA | Mobile internet |
| 23 | 3G | G network |
| 24 | AV | Average |
| 25 | VOL | Mobile internet usage volume in MB |
| 26 | 2G | G network |
| 27 | PCK | Prepaid service schemes called PACKS |
| 28 | NIGHT | Scheme to use during specific night hours only |
| 29 | MONTHLY | Service schemes with validity equivalent to a month |
| 30 | SACHET | Service schemes with validity smaller than a month |
| 31 | *.6 | KPI for the month of June |
| 32 | *.7 | KPI for the month of July |
| 33 | *.8 | KPI for the month of August |
| 34 | FB_USER | Service scheme to avail services of Facebook and similar social networking sites |
| 35 | VBC | Volume based cost when no specific scheme is not purchased and paid as per usage |
Visually inspect the `train` data
train_df.shape
(69999, 172)
train_df.head()
| id | circle_id | loc_og_t2o_mou | std_og_t2o_mou | loc_ic_t2o_mou | last_date_of_month_6 | last_date_of_month_7 | last_date_of_month_8 | arpu_6 | arpu_7 | ... | sachet_3g_7 | sachet_3g_8 | fb_user_6 | fb_user_7 | fb_user_8 | aon | aug_vbc_3g | jul_vbc_3g | jun_vbc_3g | churn_probability | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 31.277 | 87.009 | ... | 0 | 0 | NaN | NaN | NaN | 1958 | 0.0 | 0.0 | 0.0 | 0 |
| 1 | 1 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 0.000 | 122.787 | ... | 0 | 0 | NaN | 1.0 | NaN | 710 | 0.0 | 0.0 | 0.0 | 0 |
| 2 | 2 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 60.806 | 103.176 | ... | 0 | 0 | NaN | NaN | NaN | 882 | 0.0 | 0.0 | 0.0 | 0 |
| 3 | 3 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 156.362 | 205.260 | ... | 0 | 0 | NaN | NaN | NaN | 982 | 0.0 | 0.0 | 0.0 | 0 |
| 4 | 4 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 240.708 | 128.191 | ... | 1 | 0 | 1.0 | 1.0 | 1.0 | 647 | 0.0 | 0.0 | 0.0 | 0 |
5 rows × 172 columns
train_df.tail()
| id | circle_id | loc_og_t2o_mou | std_og_t2o_mou | loc_ic_t2o_mou | last_date_of_month_6 | last_date_of_month_7 | last_date_of_month_8 | arpu_6 | arpu_7 | ... | sachet_3g_7 | sachet_3g_8 | fb_user_6 | fb_user_7 | fb_user_8 | aon | aug_vbc_3g | jul_vbc_3g | jun_vbc_3g | churn_probability | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 69994 | 69994 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 15.760 | 410.924 | ... | 1 | 0 | NaN | 1.0 | 1.0 | 221 | 0.00 | 0.00 | 0.00 | 0 |
| 69995 | 69995 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 160.083 | 289.129 | ... | 0 | 0 | NaN | NaN | NaN | 712 | 0.00 | 0.00 | 0.00 | 0 |
| 69996 | 69996 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 372.088 | 258.374 | ... | 0 | 0 | NaN | NaN | NaN | 879 | 0.00 | 0.00 | 0.00 | 0 |
| 69997 | 69997 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 238.575 | 245.414 | ... | 0 | 0 | 1.0 | 1.0 | 1.0 | 277 | 664.25 | 1402.96 | 990.97 | 0 |
| 69998 | 69998 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 168.269 | 42.815 | ... | 0 | 0 | 1.0 | NaN | 1.0 | 1876 | 0.00 | 0.00 | 0.00 | 0 |
5 rows × 172 columns
train_df.columns
Index(['id', 'circle_id', 'loc_og_t2o_mou', 'std_og_t2o_mou', 'loc_ic_t2o_mou',
'last_date_of_month_6', 'last_date_of_month_7', 'last_date_of_month_8',
'arpu_6', 'arpu_7',
...
'sachet_3g_7', 'sachet_3g_8', 'fb_user_6', 'fb_user_7', 'fb_user_8',
'aon', 'aug_vbc_3g', 'jul_vbc_3g', 'jun_vbc_3g', 'churn_probability'],
dtype='object', length=172)
train_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 69999 entries, 0 to 69998 Columns: 172 entries, id to churn_probability dtypes: float64(135), int64(28), object(9) memory usage: 91.9+ MB
def programmatic_assessment(df):
'''
Returns a dataframe with variables as null counts, duplicate counts, and descriptive statistics of each variable/column in the input dataframe.
Input args:
df = dataframe
output args:
df = dataframe tabulating statistics of each variable.
'''
cols = df.columns
assess_df = pd.DataFrame()
for i, col in enumerate(cols):
assess_df.loc[i, 'variable'] = col
assess_df.loc[i, 'datatype'] = df[col].dtype
assess_df.loc[i, 'null_count'] = df[col].isna().sum()
assess_df.loc[i, 'null_pcnt'] = (df[col].isna().sum()) / df.shape[0]
assess_df.loc[i, 'duplicate_count'] = df[col].duplicated().sum()
assess_df.loc[i, 'count'] = df[col].describe()['count']
# tabulate descriptive statistics for numerical variable
if df[col].dtype in ['int64', 'float64']:
assess_df.loc[i, 'mean'] = df[col].describe()['mean']
assess_df.loc[i, 'std'] = df[col].describe()['std']
assess_df.loc[i, 'min'] = df[col].describe()['min']
assess_df.loc[i, '25%'] = df[col].describe()['25%']
assess_df.loc[i, '50%'] = df[col].describe()['50%']
assess_df.loc[i, '75%'] = df[col].describe()['75%']
assess_df.loc[i, 'max'] = df[col].describe()['max']
# tabulate descriptive statistics for categorical variable
elif df[col].dtype in ['O']:
assess_df.loc[i, 'unique'] = df[col].describe()['unique']
assess_df.loc[i, 'top'] = df[col].describe()['top']
assess_df.loc[i, 'freq'] = df[col].describe()['freq']
return assess_df
def highlight_positives(x, color):
'''
highlight values greater than 0 in the input column of the dataframe.
input args:
color: color of the text to be displayed.
'''
return np.where(x > 0, f"color: {color};", None)
def highlight_values(x, color, threshold):
'''
highlight values greater than 0 in the input column of the dataframe.
input args:
color: color of the text to be displayed.
threshold: value greater than which displayed in the specified color.
'''
return np.where(x > threshold, f"color: {color}; background-color: lightyellow;", None)
assess_df = programmatic_assessment(train_df)
assess_df.style.apply(highlight_positives, color='red', axis=1, subset="null_count")
| variable | datatype | null_count | null_pcnt | duplicate_count | count | mean | std | min | 25% | 50% | 75% | max | unique | top | freq | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | id | int64 | 0.000000 | 0.000000 | 0.000000 | 69999.000000 | 34999.000000 | 20207.115084 | 0.000000 | 17499.500000 | 34999.000000 | 52498.500000 | 69998.000000 | nan | nan | nan |
| 1 | circle_id | int64 | 0.000000 | 0.000000 | 69998.000000 | 69999.000000 | 109.000000 | 0.000000 | 109.000000 | 109.000000 | 109.000000 | 109.000000 | 109.000000 | nan | nan | nan |
| 2 | loc_og_t2o_mou | float64 | 702.000000 | 0.010029 | 69997.000000 | 69297.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 3 | std_og_t2o_mou | float64 | 702.000000 | 0.010029 | 69997.000000 | 69297.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 4 | loc_ic_t2o_mou | float64 | 702.000000 | 0.010029 | 69997.000000 | 69297.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 5 | last_date_of_month_6 | object | 0.000000 | 0.000000 | 69998.000000 | 69999.000000 | nan | nan | nan | nan | nan | nan | nan | 1.000000 | 6/30/2014 | 69999.000000 |
| 6 | last_date_of_month_7 | object | 399.000000 | 0.005700 | 69997.000000 | 69600.000000 | nan | nan | nan | nan | nan | nan | nan | 1.000000 | 7/31/2014 | 69600.000000 |
| 7 | last_date_of_month_8 | object | 733.000000 | 0.010472 | 69997.000000 | 69266.000000 | nan | nan | nan | nan | nan | nan | nan | 1.000000 | 8/31/2014 | 69266.000000 |
| 8 | arpu_6 | float64 | 0.000000 | 0.000000 | 8384.000000 | 69999.000000 | 283.134365 | 334.213918 | -2258.709000 | 93.581000 | 197.484000 | 370.791000 | 27731.088000 | nan | nan | nan |
| 9 | arpu_7 | float64 | 0.000000 | 0.000000 | 8574.000000 | 69999.000000 | 278.185912 | 344.366927 | -1289.715000 | 86.714000 | 191.588000 | 365.369500 | 35145.834000 | nan | nan | nan |
| 10 | arpu_8 | float64 | 0.000000 | 0.000000 | 9805.000000 | 69999.000000 | 278.858826 | 351.924315 | -945.808000 | 84.095000 | 192.234000 | 369.909000 | 33543.624000 | nan | nan | nan |
| 11 | onnet_mou_6 | float64 | 2768.000000 | 0.039543 | 49941.000000 | 67231.000000 | 133.153275 | 299.963093 | 0.000000 | 7.410000 | 34.110000 | 119.390000 | 7376.710000 | nan | nan | nan |
| 12 | onnet_mou_7 | float64 | 2687.000000 | 0.038386 | 49922.000000 | 67312.000000 | 133.894438 | 311.277193 | 0.000000 | 6.675000 | 32.280000 | 115.837500 | 8157.780000 | nan | nan | nan |
| 13 | onnet_mou_8 | float64 | 3703.000000 | 0.052901 | 50273.000000 | 66296.000000 | 132.978257 | 311.896596 | 0.000000 | 6.410000 | 32.100000 | 115.060000 | 10752.560000 | nan | nan | nan |
| 14 | offnet_mou_6 | float64 | 2768.000000 | 0.039543 | 43744.000000 | 67231.000000 | 198.874771 | 316.818355 | 0.000000 | 34.860000 | 96.480000 | 232.990000 | 8362.360000 | nan | nan | nan |
| 15 | offnet_mou_7 | float64 | 2687.000000 | 0.038386 | 43859.000000 | 67312.000000 | 197.153383 | 322.482226 | 0.000000 | 32.240000 | 91.885000 | 227.630000 | 7043.980000 | nan | nan | nan |
| 16 | offnet_mou_8 | float64 | 3703.000000 | 0.052901 | 44003.000000 | 66296.000000 | 196.543577 | 324.089234 | 0.000000 | 31.575000 | 91.800000 | 229.345000 | 14007.340000 | nan | nan | nan |
| 17 | roam_ic_mou_6 | float64 | 2768.000000 | 0.039543 | 64692.000000 | 67231.000000 | 9.765435 | 57.374429 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2850.980000 | nan | nan | nan |
| 18 | roam_ic_mou_7 | float64 | 2687.000000 | 0.038386 | 65758.000000 | 67312.000000 | 7.014568 | 55.960985 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4155.830000 | nan | nan | nan |
| 19 | roam_ic_mou_8 | float64 | 3703.000000 | 0.052901 | 65746.000000 | 66296.000000 | 7.004892 | 53.408135 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4169.810000 | nan | nan | nan |
| 20 | roam_og_mou_6 | float64 | 2768.000000 | 0.039543 | 63430.000000 | 67231.000000 | 14.186457 | 73.469261 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 3775.110000 | nan | nan | nan |
| 21 | roam_og_mou_7 | float64 | 2687.000000 | 0.038386 | 64693.000000 | 67312.000000 | 9.842191 | 58.511894 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2812.040000 | nan | nan | nan |
| 22 | roam_og_mou_8 | float64 | 3703.000000 | 0.052901 | 64783.000000 | 66296.000000 | 9.771783 | 64.618388 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5337.040000 | nan | nan | nan |
| 23 | loc_og_t2t_mou_6 | float64 | 2768.000000 | 0.039543 | 58507.000000 | 67231.000000 | 46.904854 | 150.971758 | 0.000000 | 1.660000 | 11.910000 | 40.740000 | 6431.330000 | nan | nan | nan |
| 24 | loc_og_t2t_mou_7 | float64 | 2687.000000 | 0.038386 | 58639.000000 | 67312.000000 | 46.166503 | 154.739002 | 0.000000 | 1.650000 | 11.580000 | 39.760000 | 7400.660000 | nan | nan | nan |
| 25 | loc_og_t2t_mou_8 | float64 | 3703.000000 | 0.052901 | 58702.000000 | 66296.000000 | 45.686109 | 153.716880 | 0.000000 | 1.610000 | 11.740000 | 39.895000 | 10752.560000 | nan | nan | nan |
| 26 | loc_og_t2m_mou_6 | float64 | 2768.000000 | 0.039543 | 52150.000000 | 67231.000000 | 93.238231 | 162.046699 | 0.000000 | 9.920000 | 41.030000 | 110.430000 | 4696.830000 | nan | nan | nan |
| 27 | loc_og_t2m_mou_7 | float64 | 2687.000000 | 0.038386 | 52357.000000 | 67312.000000 | 90.799240 | 153.852597 | 0.000000 | 10.090000 | 40.170000 | 107.540000 | 4557.140000 | nan | nan | nan |
| 28 | loc_og_t2m_mou_8 | float64 | 3703.000000 | 0.052901 | 52382.000000 | 66296.000000 | 91.121447 | 152.997805 | 0.000000 | 9.830000 | 40.350000 | 109.245000 | 4961.330000 | nan | nan | nan |
| 29 | loc_og_t2f_mou_6 | float64 | 2768.000000 | 0.039543 | 66622.000000 | 67231.000000 | 3.743179 | 13.319542 | 0.000000 | 0.000000 | 0.000000 | 2.060000 | 617.580000 | nan | nan | nan |
| 30 | loc_og_t2f_mou_7 | float64 | 2687.000000 | 0.038386 | 66633.000000 | 67312.000000 | 3.777031 | 13.568110 | 0.000000 | 0.000000 | 0.000000 | 2.080000 | 815.330000 | nan | nan | nan |
| 31 | loc_og_t2f_mou_8 | float64 | 3703.000000 | 0.052901 | 66687.000000 | 66296.000000 | 3.661652 | 13.009193 | 0.000000 | 0.000000 | 0.000000 | 2.030000 | 588.290000 | nan | nan | nan |
| 32 | loc_og_t2c_mou_6 | float64 | 2768.000000 | 0.039543 | 68053.000000 | 67231.000000 | 1.126025 | 5.741811 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 342.860000 | nan | nan | nan |
| 33 | loc_og_t2c_mou_7 | float64 | 2687.000000 | 0.038386 | 67895.000000 | 67312.000000 | 1.361052 | 7.914113 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 916.240000 | nan | nan | nan |
| 34 | loc_og_t2c_mou_8 | float64 | 3703.000000 | 0.052901 | 67794.000000 | 66296.000000 | 1.420840 | 6.542202 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 351.830000 | nan | nan | nan |
| 35 | loc_og_mou_6 | float64 | 2768.000000 | 0.039543 | 47689.000000 | 67231.000000 | 143.893585 | 252.034597 | 0.000000 | 17.235000 | 65.190000 | 167.880000 | 10643.380000 | nan | nan | nan |
| 36 | loc_og_mou_7 | float64 | 2687.000000 | 0.038386 | 47986.000000 | 67312.000000 | 140.750120 | 246.313148 | 0.000000 | 17.590000 | 63.430000 | 163.932500 | 7674.780000 | nan | nan | nan |
| 37 | loc_og_mou_8 | float64 | 3703.000000 | 0.052901 | 48048.000000 | 66296.000000 | 140.476486 | 245.342359 | 0.000000 | 17.237500 | 63.520000 | 165.615000 | 11039.910000 | nan | nan | nan |
| 38 | std_og_t2t_mou_6 | float64 | 2768.000000 | 0.039543 | 55444.000000 | 67231.000000 | 80.619382 | 255.098355 | 0.000000 | 0.000000 | 0.000000 | 31.020000 | 7366.580000 | nan | nan | nan |
| 39 | std_og_t2t_mou_7 | float64 | 2687.000000 | 0.038386 | 55253.000000 | 67312.000000 | 83.775851 | 266.693254 | 0.000000 | 0.000000 | 0.000000 | 31.300000 | 8133.660000 | nan | nan | nan |
| 40 | std_og_t2t_mou_8 | float64 | 3703.000000 | 0.052901 | 55515.000000 | 66296.000000 | 83.471486 | 267.021929 | 0.000000 | 0.000000 | 0.000000 | 30.760000 | 8014.430000 | nan | nan | nan |
| 41 | std_og_t2m_mou_6 | float64 | 2768.000000 | 0.039543 | 53939.000000 | 67231.000000 | 88.152110 | 255.771554 | 0.000000 | 0.000000 | 3.980000 | 53.745000 | 8314.760000 | nan | nan | nan |
| 42 | std_og_t2m_mou_7 | float64 | 2687.000000 | 0.038386 | 53764.000000 | 67312.000000 | 91.538615 | 267.532089 | 0.000000 | 0.000000 | 3.710000 | 54.640000 | 6622.540000 | nan | nan | nan |
| 43 | std_og_t2m_mou_8 | float64 | 3703.000000 | 0.052901 | 54030.000000 | 66296.000000 | 90.586999 | 270.032002 | 0.000000 | 0.000000 | 3.300000 | 52.660000 | 13950.040000 | nan | nan | nan |
| 44 | std_og_t2f_mou_6 | float64 | 2768.000000 | 0.039543 | 67924.000000 | 67231.000000 | 1.126377 | 8.136645 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 628.560000 | nan | nan | nan |
| 45 | std_og_t2f_mou_7 | float64 | 2687.000000 | 0.038386 | 68007.000000 | 67312.000000 | 1.084062 | 8.325206 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 465.790000 | nan | nan | nan |
| 46 | std_og_t2f_mou_8 | float64 | 3703.000000 | 0.052901 | 68053.000000 | 66296.000000 | 1.057739 | 7.696853 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 354.160000 | nan | nan | nan |
| 47 | std_og_t2c_mou_6 | float64 | 2768.000000 | 0.039543 | 69997.000000 | 67231.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 48 | std_og_t2c_mou_7 | float64 | 2687.000000 | 0.038386 | 69997.000000 | 67312.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 49 | std_og_t2c_mou_8 | float64 | 3703.000000 | 0.052901 | 69997.000000 | 66296.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 50 | std_og_mou_6 | float64 | 2768.000000 | 0.039543 | 48161.000000 | 67231.000000 | 169.900601 | 392.046600 | 0.000000 | 0.000000 | 11.730000 | 146.335000 | 8432.990000 | nan | nan | nan |
| 51 | std_og_mou_7 | float64 | 2687.000000 | 0.038386 | 47910.000000 | 67312.000000 | 176.401217 | 409.299501 | 0.000000 | 0.000000 | 11.260000 | 151.645000 | 8155.530000 | nan | nan | nan |
| 52 | std_og_mou_8 | float64 | 3703.000000 | 0.052901 | 48253.000000 | 66296.000000 | 175.118852 | 410.697098 | 0.000000 | 0.000000 | 10.505000 | 149.015000 | 13980.060000 | nan | nan | nan |
| 53 | isd_og_mou_6 | float64 | 2768.000000 | 0.039543 | 68895.000000 | 67231.000000 | 0.845763 | 29.747486 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5900.660000 | nan | nan | nan |
| 54 | isd_og_mou_7 | float64 | 2687.000000 | 0.038386 | 68914.000000 | 67312.000000 | 0.811100 | 29.220073 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5490.280000 | nan | nan | nan |
| 55 | isd_og_mou_8 | float64 | 3703.000000 | 0.052901 | 68990.000000 | 66296.000000 | 0.841648 | 29.563367 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5681.540000 | nan | nan | nan |
| 56 | spl_og_mou_6 | float64 | 2768.000000 | 0.039543 | 66495.000000 | 67231.000000 | 3.958619 | 15.854529 | 0.000000 | 0.000000 | 0.000000 | 2.400000 | 1023.210000 | nan | nan | nan |
| 57 | spl_og_mou_7 | float64 | 2687.000000 | 0.038386 | 66171.000000 | 67312.000000 | 4.976783 | 22.229842 | 0.000000 | 0.000000 | 0.000000 | 3.660000 | 2372.510000 | nan | nan | nan |
| 58 | spl_og_mou_8 | float64 | 3703.000000 | 0.052901 | 66150.000000 | 66296.000000 | 5.045027 | 17.708507 | 0.000000 | 0.000000 | 0.000000 | 4.002500 | 1075.080000 | nan | nan | nan |
| 59 | og_others_6 | float64 | 2768.000000 | 0.039543 | 69084.000000 | 67231.000000 | 0.462581 | 4.768437 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 800.890000 | nan | nan | nan |
| 60 | og_others_7 | float64 | 2687.000000 | 0.038386 | 69850.000000 | 67312.000000 | 0.024425 | 1.716430 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 270.240000 | nan | nan | nan |
| 61 | og_others_8 | float64 | 3703.000000 | 0.052901 | 69819.000000 | 66296.000000 | 0.033059 | 2.232547 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 394.930000 | nan | nan | nan |
| 62 | total_og_mou_6 | float64 | 0.000000 | 0.000000 | 36864.000000 | 69999.000000 | 306.451436 | 465.502866 | 0.000000 | 44.780000 | 145.280000 | 374.305000 | 10674.030000 | nan | nan | nan |
| 63 | total_og_mou_7 | float64 | 0.000000 | 0.000000 | 36804.000000 | 69999.000000 | 310.572674 | 479.131770 | 0.000000 | 42.910000 | 141.230000 | 380.045000 | 8285.640000 | nan | nan | nan |
| 64 | total_og_mou_8 | float64 | 0.000000 | 0.000000 | 37102.000000 | 69999.000000 | 304.513065 | 477.936832 | 0.000000 | 38.710000 | 138.360000 | 370.895000 | 14043.060000 | nan | nan | nan |
| 65 | loc_ic_t2t_mou_6 | float64 | 2768.000000 | 0.039543 | 58361.000000 | 67231.000000 | 48.043255 | 140.499757 | 0.000000 | 3.030000 | 15.740000 | 46.980000 | 5315.590000 | nan | nan | nan |
| 66 | loc_ic_t2t_mou_7 | float64 | 2687.000000 | 0.038386 | 58403.000000 | 67312.000000 | 47.882736 | 147.761124 | 0.000000 | 3.260000 | 15.830000 | 45.690000 | 9324.660000 | nan | nan | nan |
| 67 | loc_ic_t2t_mou_8 | float64 | 3703.000000 | 0.052901 | 58552.000000 | 66296.000000 | 47.256388 | 141.249368 | 0.000000 | 3.280000 | 16.040000 | 46.280000 | 10696.230000 | nan | nan | nan |
| 68 | loc_ic_t2m_mou_6 | float64 | 2768.000000 | 0.039543 | 50964.000000 | 67231.000000 | 107.152439 | 168.455999 | 0.000000 | 17.390000 | 56.460000 | 132.020000 | 4450.740000 | nan | nan | nan |
| 69 | loc_ic_t2m_mou_7 | float64 | 2687.000000 | 0.038386 | 51053.000000 | 67312.000000 | 106.489856 | 165.452459 | 0.000000 | 18.610000 | 56.930000 | 131.010000 | 4455.830000 | nan | nan | nan |
| 70 | loc_ic_t2m_mou_8 | float64 | 3703.000000 | 0.052901 | 50976.000000 | 66296.000000 | 108.154731 | 166.223461 | 0.000000 | 18.940000 | 58.210000 | 134.380000 | 6274.190000 | nan | nan | nan |
| 71 | loc_ic_t2f_mou_6 | float64 | 2768.000000 | 0.039543 | 63798.000000 | 67231.000000 | 12.050672 | 39.416076 | 0.000000 | 0.000000 | 0.880000 | 8.140000 | 1872.340000 | nan | nan | nan |
| 72 | loc_ic_t2f_mou_7 | float64 | 2687.000000 | 0.038386 | 63711.000000 | 67312.000000 | 12.563665 | 43.495179 | 0.000000 | 0.000000 | 0.910000 | 8.230000 | 1983.010000 | nan | nan | nan |
| 73 | loc_ic_t2f_mou_8 | float64 | 3703.000000 | 0.052901 | 63906.000000 | 66296.000000 | 11.716763 | 38.606895 | 0.000000 | 0.000000 | 0.930000 | 8.090000 | 1676.580000 | nan | nan | nan |
| 74 | loc_ic_mou_6 | float64 | 2768.000000 | 0.039543 | 45556.000000 | 67231.000000 | 167.255126 | 252.576231 | 0.000000 | 30.630000 | 92.430000 | 208.325000 | 7454.630000 | nan | nan | nan |
| 75 | loc_ic_mou_7 | float64 | 2687.000000 | 0.038386 | 45740.000000 | 67312.000000 | 166.945103 | 254.688718 | 0.000000 | 32.710000 | 92.510000 | 205.530000 | 9669.910000 | nan | nan | nan |
| 76 | loc_ic_mou_8 | float64 | 3703.000000 | 0.052901 | 45816.000000 | 66296.000000 | 167.136761 | 249.288410 | 0.000000 | 32.810000 | 93.890000 | 208.060000 | 10830.160000 | nan | nan | nan |
| 77 | std_ic_t2t_mou_6 | float64 | 2768.000000 | 0.039543 | 64675.000000 | 67231.000000 | 9.476958 | 51.664472 | 0.000000 | 0.000000 | 0.000000 | 4.060000 | 3336.380000 | nan | nan | nan |
| 78 | std_ic_t2t_mou_7 | float64 | 2687.000000 | 0.038386 | 64519.000000 | 67312.000000 | 9.873468 | 56.137824 | 0.000000 | 0.000000 | 0.000000 | 4.180000 | 4708.710000 | nan | nan | nan |
| 79 | std_ic_t2t_mou_8 | float64 | 3703.000000 | 0.052901 | 64631.000000 | 66296.000000 | 9.910217 | 54.248186 | 0.000000 | 0.000000 | 0.000000 | 4.052500 | 3930.240000 | nan | nan | nan |
| 80 | std_ic_t2m_mou_6 | float64 | 2768.000000 | 0.039543 | 62067.000000 | 67231.000000 | 20.734858 | 80.294236 | 0.000000 | 0.000000 | 2.040000 | 14.960000 | 5647.160000 | nan | nan | nan |
| 81 | std_ic_t2m_mou_7 | float64 | 2687.000000 | 0.038386 | 61965.000000 | 67312.000000 | 21.685359 | 87.314510 | 0.000000 | 0.000000 | 2.060000 | 15.830000 | 6141.880000 | nan | nan | nan |
| 82 | std_ic_t2m_mou_8 | float64 | 3703.000000 | 0.052901 | 62092.000000 | 66296.000000 | 21.089042 | 81.534344 | 0.000000 | 0.000000 | 2.030000 | 15.310000 | 5512.760000 | nan | nan | nan |
| 83 | std_ic_t2f_mou_6 | float64 | 2768.000000 | 0.039543 | 67374.000000 | 67231.000000 | 2.146273 | 16.522232 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1351.110000 | nan | nan | nan |
| 84 | std_ic_t2f_mou_7 | float64 | 2687.000000 | 0.038386 | 67303.000000 | 67312.000000 | 2.199395 | 16.171533 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1136.080000 | nan | nan | nan |
| 85 | std_ic_t2f_mou_8 | float64 | 3703.000000 | 0.052901 | 67415.000000 | 66296.000000 | 2.075179 | 15.865403 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1394.890000 | nan | nan | nan |
| 86 | std_ic_t2o_mou_6 | float64 | 2768.000000 | 0.039543 | 69997.000000 | 67231.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 87 | std_ic_t2o_mou_7 | float64 | 2687.000000 | 0.038386 | 69997.000000 | 67312.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 88 | std_ic_t2o_mou_8 | float64 | 3703.000000 | 0.052901 | 69997.000000 | 66296.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 89 | std_ic_mou_6 | float64 | 2768.000000 | 0.039543 | 60122.000000 | 67231.000000 | 32.360632 | 104.381082 | 0.000000 | 0.000000 | 5.910000 | 26.780000 | 5712.110000 | nan | nan | nan |
| 90 | std_ic_mou_7 | float64 | 2687.000000 | 0.038386 | 59886.000000 | 67312.000000 | 33.760809 | 114.142230 | 0.000000 | 0.000000 | 5.980000 | 28.160000 | 6745.760000 | nan | nan | nan |
| 91 | std_ic_mou_8 | float64 | 3703.000000 | 0.052901 | 60059.000000 | 66296.000000 | 33.077030 | 108.469864 | 0.000000 | 0.030000 | 5.830000 | 27.615000 | 5658.740000 | nan | nan | nan |
| 92 | total_ic_mou_6 | float64 | 0.000000 | 0.000000 | 42567.000000 | 69999.000000 | 199.710640 | 290.114823 | 0.000000 | 38.640000 | 114.780000 | 251.070000 | 7716.140000 | nan | nan | nan |
| 93 | total_ic_mou_7 | float64 | 0.000000 | 0.000000 | 42581.000000 | 69999.000000 | 201.878029 | 296.771338 | 0.000000 | 41.340000 | 116.330000 | 249.470000 | 9699.010000 | nan | nan | nan |
| 94 | total_ic_mou_8 | float64 | 0.000000 | 0.000000 | 42681.000000 | 69999.000000 | 198.486034 | 288.336731 | 0.000000 | 38.290000 | 114.610000 | 249.710000 | 10830.380000 | nan | nan | nan |
| 95 | spl_ic_mou_6 | float64 | 2768.000000 | 0.039543 | 69920.000000 | 67231.000000 | 0.061932 | 0.164823 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 19.760000 | nan | nan | nan |
| 96 | spl_ic_mou_7 | float64 | 2687.000000 | 0.038386 | 69895.000000 | 67312.000000 | 0.033371 | 0.137322 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 13.460000 | nan | nan | nan |
| 97 | spl_ic_mou_8 | float64 | 3703.000000 | 0.052901 | 69903.000000 | 66296.000000 | 0.040392 | 0.148417 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 16.860000 | nan | nan | nan |
| 98 | isd_ic_mou_6 | float64 | 2768.000000 | 0.039543 | 65577.000000 | 67231.000000 | 7.394167 | 60.951165 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 6789.410000 | nan | nan | nan |
| 99 | isd_ic_mou_7 | float64 | 2687.000000 | 0.038386 | 65358.000000 | 67312.000000 | 8.171162 | 63.604165 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5289.540000 | nan | nan | nan |
| 100 | isd_ic_mou_8 | float64 | 3703.000000 | 0.052901 | 65304.000000 | 66296.000000 | 8.348424 | 63.097570 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4127.010000 | nan | nan | nan |
| 101 | ic_others_6 | float64 | 2768.000000 | 0.039543 | 68475.000000 | 67231.000000 | 0.854063 | 12.149144 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1362.940000 | nan | nan | nan |
| 102 | ic_others_7 | float64 | 2687.000000 | 0.038386 | 68340.000000 | 67312.000000 | 1.019680 | 13.225373 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1495.940000 | nan | nan | nan |
| 103 | ic_others_8 | float64 | 3703.000000 | 0.052901 | 68433.000000 | 66296.000000 | 0.963214 | 11.697686 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1209.860000 | nan | nan | nan |
| 104 | total_rech_num_6 | int64 | 0.000000 | 0.000000 | 69905.000000 | 69999.000000 | 7.566522 | 7.041452 | 0.000000 | 3.000000 | 6.000000 | 9.000000 | 170.000000 | nan | nan | nan |
| 105 | total_rech_num_7 | int64 | 0.000000 | 0.000000 | 69903.000000 | 69999.000000 | 7.706667 | 7.050614 | 0.000000 | 3.000000 | 6.000000 | 10.000000 | 138.000000 | nan | nan | nan |
| 106 | total_rech_num_8 | int64 | 0.000000 | 0.000000 | 69907.000000 | 69999.000000 | 7.224932 | 7.195597 | 0.000000 | 3.000000 | 5.000000 | 9.000000 | 138.000000 | nan | nan | nan |
| 107 | total_rech_amt_6 | int64 | 0.000000 | 0.000000 | 67891.000000 | 69999.000000 | 328.139788 | 404.211068 | 0.000000 | 110.000000 | 229.000000 | 438.000000 | 35190.000000 | nan | nan | nan |
| 108 | total_rech_amt_7 | int64 | 0.000000 | 0.000000 | 67852.000000 | 69999.000000 | 322.376363 | 411.070120 | 0.000000 | 100.000000 | 220.000000 | 430.000000 | 40335.000000 | nan | nan | nan |
| 109 | total_rech_amt_8 | int64 | 0.000000 | 0.000000 | 67850.000000 | 69999.000000 | 323.846355 | 426.181405 | 0.000000 | 90.000000 | 225.000000 | 436.000000 | 45320.000000 | nan | nan | nan |
| 110 | max_rech_amt_6 | int64 | 0.000000 | 0.000000 | 69809.000000 | 69999.000000 | 104.569265 | 121.407701 | 0.000000 | 30.000000 | 110.000000 | 120.000000 | 4010.000000 | nan | nan | nan |
| 111 | max_rech_amt_7 | int64 | 0.000000 | 0.000000 | 69821.000000 | 69999.000000 | 104.137573 | 120.782543 | 0.000000 | 30.000000 | 110.000000 | 128.000000 | 3299.000000 | nan | nan | nan |
| 112 | max_rech_amt_8 | int64 | 0.000000 | 0.000000 | 69797.000000 | 69999.000000 | 107.540351 | 124.396750 | 0.000000 | 30.000000 | 98.000000 | 144.000000 | 4449.000000 | nan | nan | nan |
| 113 | date_of_last_rech_6 | object | 1101.000000 | 0.015729 | 69968.000000 | 68898.000000 | nan | nan | nan | nan | nan | nan | nan | 30.000000 | 6/30/2014 | 11880.000000 |
| 114 | date_of_last_rech_7 | object | 1234.000000 | 0.017629 | 69967.000000 | 68765.000000 | nan | nan | nan | nan | nan | nan | nan | 31.000000 | 7/31/2014 | 12206.000000 |
| 115 | date_of_last_rech_8 | object | 2461.000000 | 0.035158 | 69967.000000 | 67538.000000 | nan | nan | nan | nan | nan | nan | nan | 31.000000 | 8/31/2014 | 10324.000000 |
| 116 | last_day_rch_amt_6 | int64 | 0.000000 | 0.000000 | 69820.000000 | 69999.000000 | 63.426949 | 97.954876 | 0.000000 | 0.000000 | 30.000000 | 110.000000 | 4010.000000 | nan | nan | nan |
| 117 | last_day_rch_amt_7 | int64 | 0.000000 | 0.000000 | 69833.000000 | 69999.000000 | 59.294218 | 95.429492 | 0.000000 | 0.000000 | 30.000000 | 110.000000 | 3100.000000 | nan | nan | nan |
| 118 | last_day_rch_amt_8 | int64 | 0.000000 | 0.000000 | 69808.000000 | 69999.000000 | 62.489478 | 101.996729 | 0.000000 | 0.000000 | 30.000000 | 130.000000 | 4449.000000 | nan | nan | nan |
| 119 | date_of_last_rech_data_6 | object | 52431.000000 | 0.749025 | 69968.000000 | 17568.000000 | nan | nan | nan | nan | nan | nan | nan | 30.000000 | 6/30/2014 | 1317.000000 |
| 120 | date_of_last_rech_data_7 | object | 52134.000000 | 0.744782 | 69967.000000 | 17865.000000 | nan | nan | nan | nan | nan | nan | nan | 31.000000 | 7/31/2014 | 1282.000000 |
| 121 | date_of_last_rech_data_8 | object | 51582.000000 | 0.736896 | 69967.000000 | 18417.000000 | nan | nan | nan | nan | nan | nan | nan | 31.000000 | 8/31/2014 | 1388.000000 |
| 122 | total_rech_data_6 | float64 | 52431.000000 | 0.749025 | 69962.000000 | 17568.000000 | 2.467612 | 2.794610 | 1.000000 | 1.000000 | 1.000000 | 3.000000 | 61.000000 | nan | nan | nan |
| 123 | total_rech_data_7 | float64 | 52134.000000 | 0.744782 | 69957.000000 | 17865.000000 | 2.679989 | 3.073472 | 1.000000 | 1.000000 | 2.000000 | 3.000000 | 54.000000 | nan | nan | nan |
| 124 | total_rech_data_8 | float64 | 51582.000000 | 0.736896 | 69954.000000 | 18417.000000 | 2.652441 | 3.101265 | 1.000000 | 1.000000 | 1.000000 | 3.000000 | 60.000000 | nan | nan | nan |
| 125 | max_rech_data_6 | float64 | 52431.000000 | 0.749025 | 69951.000000 | 17568.000000 | 126.500000 | 109.352573 | 1.000000 | 25.000000 | 145.000000 | 177.000000 | 1555.000000 | nan | nan | nan |
| 126 | max_rech_data_7 | float64 | 52134.000000 | 0.744782 | 69953.000000 | 17865.000000 | 126.402071 | 109.459266 | 1.000000 | 25.000000 | 145.000000 | 177.000000 | 1555.000000 | nan | nan | nan |
| 127 | max_rech_data_8 | float64 | 51582.000000 | 0.736896 | 69949.000000 | 18417.000000 | 125.374925 | 109.648799 | 1.000000 | 25.000000 | 145.000000 | 179.000000 | 1555.000000 | nan | nan | nan |
| 128 | count_rech_2g_6 | float64 | 52431.000000 | 0.749025 | 69968.000000 | 17568.000000 | 1.865323 | 2.566377 | 0.000000 | 1.000000 | 1.000000 | 2.000000 | 42.000000 | nan | nan | nan |
| 129 | count_rech_2g_7 | float64 | 52134.000000 | 0.744782 | 69965.000000 | 17865.000000 | 2.056311 | 2.799916 | 0.000000 | 1.000000 | 1.000000 | 2.000000 | 48.000000 | nan | nan | nan |
| 130 | count_rech_2g_8 | float64 | 51582.000000 | 0.736896 | 69965.000000 | 18417.000000 | 2.016018 | 2.728246 | 0.000000 | 1.000000 | 1.000000 | 2.000000 | 44.000000 | nan | nan | nan |
| 131 | count_rech_3g_6 | float64 | 52431.000000 | 0.749025 | 69975.000000 | 17568.000000 | 0.602288 | 1.279297 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 29.000000 | nan | nan | nan |
| 132 | count_rech_3g_7 | float64 | 52134.000000 | 0.744782 | 69971.000000 | 17865.000000 | 0.623678 | 1.401230 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 34.000000 | nan | nan | nan |
| 133 | count_rech_3g_8 | float64 | 51582.000000 | 0.736896 | 69969.000000 | 18417.000000 | 0.636423 | 1.457058 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 45.000000 | nan | nan | nan |
| 134 | av_rech_amt_data_6 | float64 | 52431.000000 | 0.749025 | 69206.000000 | 17568.000000 | 192.831096 | 190.623115 | 1.000000 | 82.000000 | 154.000000 | 252.000000 | 5920.000000 | nan | nan | nan |
| 135 | av_rech_amt_data_7 | float64 | 52134.000000 | 0.744782 | 69161.000000 | 17865.000000 | 201.455940 | 198.346141 | 1.000000 | 92.000000 | 154.000000 | 252.000000 | 4365.000000 | nan | nan | nan |
| 136 | av_rech_amt_data_8 | float64 | 51582.000000 | 0.736896 | 69143.000000 | 18417.000000 | 196.815792 | 192.280532 | 1.000000 | 84.000000 | 154.000000 | 252.000000 | 4076.000000 | nan | nan | nan |
| 137 | vol_2g_mb_6 | float64 | 0.000000 | 0.000000 | 58505.000000 | 69999.000000 | 51.773924 | 212.513909 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 10285.900000 | nan | nan | nan |
| 138 | vol_2g_mb_7 | float64 | 0.000000 | 0.000000 | 58556.000000 | 69999.000000 | 51.240204 | 211.114667 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 7873.550000 | nan | nan | nan |
| 139 | vol_2g_mb_8 | float64 | 0.000000 | 0.000000 | 58640.000000 | 69999.000000 | 50.127506 | 213.101403 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 11117.610000 | nan | nan | nan |
| 140 | vol_3g_mb_6 | float64 | 0.000000 | 0.000000 | 60231.000000 | 69999.000000 | 122.171882 | 554.869965 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 45735.400000 | nan | nan | nan |
| 141 | vol_3g_mb_7 | float64 | 0.000000 | 0.000000 | 59703.000000 | 69999.000000 | 128.934444 | 554.096072 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 28144.120000 | nan | nan | nan |
| 142 | vol_3g_mb_8 | float64 | 0.000000 | 0.000000 | 59336.000000 | 69999.000000 | 135.486541 | 568.310234 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 30036.060000 | nan | nan | nan |
| 143 | arpu_3g_6 | float64 | 52431.000000 | 0.749025 | 64492.000000 | 17568.000000 | 90.069931 | 193.600413 | -20.380000 | 0.000000 | 0.520000 | 122.070000 | 5054.370000 | nan | nan | nan |
| 144 | arpu_3g_7 | float64 | 52134.000000 | 0.744782 | 64609.000000 | 17865.000000 | 89.115767 | 195.826990 | -26.040000 | 0.000000 | 0.420000 | 120.860000 | 4980.900000 | nan | nan | nan |
| 145 | arpu_3g_8 | float64 | 51582.000000 | 0.736896 | 64182.000000 | 18417.000000 | 90.618564 | 189.907986 | -24.490000 | 0.000000 | 0.840000 | 122.070000 | 3716.900000 | nan | nan | nan |
| 146 | arpu_2g_6 | float64 | 52431.000000 | 0.749025 | 64609.000000 | 17568.000000 | 86.863900 | 171.321203 | -35.830000 | 0.000000 | 11.300000 | 122.070000 | 5054.350000 | nan | nan | nan |
| 147 | arpu_2g_7 | float64 | 52134.000000 | 0.744782 | 64942.000000 | 17865.000000 | 85.846074 | 178.067280 | -13.090000 | 0.000000 | 8.800000 | 122.070000 | 4809.360000 | nan | nan | nan |
| 148 | arpu_2g_8 | float64 | 51582.000000 | 0.736896 | 64866.000000 | 18417.000000 | 86.348404 | 170.297094 | -55.830000 | 0.000000 | 9.090000 | 122.070000 | 3483.170000 | nan | nan | nan |
| 149 | night_pck_user_6 | float64 | 52431.000000 | 0.749025 | 69996.000000 | 17568.000000 | 0.025273 | 0.156958 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | nan | nan | nan |
| 150 | night_pck_user_7 | float64 | 52134.000000 | 0.744782 | 69996.000000 | 17865.000000 | 0.024069 | 0.153269 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | nan | nan | nan |
| 151 | night_pck_user_8 | float64 | 51582.000000 | 0.736896 | 69996.000000 | 18417.000000 | 0.021013 | 0.143432 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | nan | nan | nan |
| 152 | monthly_2g_6 | int64 | 0.000000 | 0.000000 | 69994.000000 | 69999.000000 | 0.079287 | 0.294719 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4.000000 | nan | nan | nan |
| 153 | monthly_2g_7 | int64 | 0.000000 | 0.000000 | 69993.000000 | 69999.000000 | 0.083401 | 0.304802 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5.000000 | nan | nan | nan |
| 154 | monthly_2g_8 | int64 | 0.000000 | 0.000000 | 69993.000000 | 69999.000000 | 0.080930 | 0.299254 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5.000000 | nan | nan | nan |
| 155 | sachet_2g_6 | int64 | 0.000000 | 0.000000 | 69969.000000 | 69999.000000 | 0.388863 | 1.494206 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 42.000000 | nan | nan | nan |
| 156 | sachet_2g_7 | int64 | 0.000000 | 0.000000 | 69966.000000 | 69999.000000 | 0.441406 | 1.651012 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 48.000000 | nan | nan | nan |
| 157 | sachet_2g_8 | int64 | 0.000000 | 0.000000 | 69967.000000 | 69999.000000 | 0.449492 | 1.632450 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 44.000000 | nan | nan | nan |
| 158 | monthly_3g_6 | int64 | 0.000000 | 0.000000 | 69989.000000 | 69999.000000 | 0.075815 | 0.358905 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 9.000000 | nan | nan | nan |
| 159 | monthly_3g_7 | int64 | 0.000000 | 0.000000 | 69986.000000 | 69999.000000 | 0.077730 | 0.383189 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 16.000000 | nan | nan | nan |
| 160 | monthly_3g_8 | int64 | 0.000000 | 0.000000 | 69987.000000 | 69999.000000 | 0.081958 | 0.381821 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 16.000000 | nan | nan | nan |
| 161 | sachet_3g_6 | int64 | 0.000000 | 0.000000 | 69976.000000 | 69999.000000 | 0.075344 | 0.573003 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 29.000000 | nan | nan | nan |
| 162 | sachet_3g_7 | int64 | 0.000000 | 0.000000 | 69973.000000 | 69999.000000 | 0.081444 | 0.634547 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 33.000000 | nan | nan | nan |
| 163 | sachet_3g_8 | int64 | 0.000000 | 0.000000 | 69971.000000 | 69999.000000 | 0.085487 | 0.680035 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 41.000000 | nan | nan | nan |
| 164 | fb_user_6 | float64 | 52431.000000 | 0.749025 | 69996.000000 | 17568.000000 | 0.916325 | 0.276907 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | nan | nan | nan |
| 165 | fb_user_7 | float64 | 52134.000000 | 0.744782 | 69996.000000 | 17865.000000 | 0.909544 | 0.286842 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | nan | nan | nan |
| 166 | fb_user_8 | float64 | 51582.000000 | 0.736896 | 69996.000000 | 18417.000000 | 0.890319 | 0.312501 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | nan | nan | nan |
| 167 | aon | int64 | 0.000000 | 0.000000 | 66544.000000 | 69999.000000 | 1220.639709 | 952.426321 | 180.000000 | 468.000000 | 868.000000 | 1813.000000 | 4337.000000 | nan | nan | nan |
| 168 | aug_vbc_3g | float64 | 0.000000 | 0.000000 | 59390.000000 | 69999.000000 | 68.108597 | 269.328659 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 12916.220000 | nan | nan | nan |
| 169 | jul_vbc_3g | float64 | 0.000000 | 0.000000 | 59742.000000 | 69999.000000 | 65.935830 | 267.899034 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 9165.600000 | nan | nan | nan |
| 170 | jun_vbc_3g | float64 | 0.000000 | 0.000000 | 60382.000000 | 69999.000000 | 60.076740 | 257.226810 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 11166.210000 | nan | nan | nan |
| 171 | churn_probability | int64 | 0.000000 | 0.000000 | 69997.000000 | 69999.000000 | 0.101887 | 0.302502 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | nan | nan | nan |
Filter variables with null_count greater than zero, sort them by descending order, and highlight the null count in red color .
# filter variables with null_count greater than zero, sort them by descending order, and highlight the null count in red color
assess_df[assess_df['null_count']>0].sort_values(by=['datatype', 'null_count'],
ascending=False).style.apply(highlight_values,
color='red',
threshold=0.7,
axis=1,
subset="null_pcnt")
| variable | datatype | null_count | null_pcnt | duplicate_count | count | mean | std | min | 25% | 50% | 75% | max | unique | top | freq | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 119 | date_of_last_rech_data_6 | object | 52431.000000 | 0.749025 | 69968.000000 | 17568.000000 | nan | nan | nan | nan | nan | nan | nan | 30.000000 | 6/30/2014 | 1317.000000 |
| 120 | date_of_last_rech_data_7 | object | 52134.000000 | 0.744782 | 69967.000000 | 17865.000000 | nan | nan | nan | nan | nan | nan | nan | 31.000000 | 7/31/2014 | 1282.000000 |
| 121 | date_of_last_rech_data_8 | object | 51582.000000 | 0.736896 | 69967.000000 | 18417.000000 | nan | nan | nan | nan | nan | nan | nan | 31.000000 | 8/31/2014 | 1388.000000 |
| 115 | date_of_last_rech_8 | object | 2461.000000 | 0.035158 | 69967.000000 | 67538.000000 | nan | nan | nan | nan | nan | nan | nan | 31.000000 | 8/31/2014 | 10324.000000 |
| 114 | date_of_last_rech_7 | object | 1234.000000 | 0.017629 | 69967.000000 | 68765.000000 | nan | nan | nan | nan | nan | nan | nan | 31.000000 | 7/31/2014 | 12206.000000 |
| 113 | date_of_last_rech_6 | object | 1101.000000 | 0.015729 | 69968.000000 | 68898.000000 | nan | nan | nan | nan | nan | nan | nan | 30.000000 | 6/30/2014 | 11880.000000 |
| 7 | last_date_of_month_8 | object | 733.000000 | 0.010472 | 69997.000000 | 69266.000000 | nan | nan | nan | nan | nan | nan | nan | 1.000000 | 8/31/2014 | 69266.000000 |
| 6 | last_date_of_month_7 | object | 399.000000 | 0.005700 | 69997.000000 | 69600.000000 | nan | nan | nan | nan | nan | nan | nan | 1.000000 | 7/31/2014 | 69600.000000 |
| 122 | total_rech_data_6 | float64 | 52431.000000 | 0.749025 | 69962.000000 | 17568.000000 | 2.467612 | 2.794610 | 1.000000 | 1.000000 | 1.000000 | 3.000000 | 61.000000 | nan | nan | nan |
| 125 | max_rech_data_6 | float64 | 52431.000000 | 0.749025 | 69951.000000 | 17568.000000 | 126.500000 | 109.352573 | 1.000000 | 25.000000 | 145.000000 | 177.000000 | 1555.000000 | nan | nan | nan |
| 128 | count_rech_2g_6 | float64 | 52431.000000 | 0.749025 | 69968.000000 | 17568.000000 | 1.865323 | 2.566377 | 0.000000 | 1.000000 | 1.000000 | 2.000000 | 42.000000 | nan | nan | nan |
| 131 | count_rech_3g_6 | float64 | 52431.000000 | 0.749025 | 69975.000000 | 17568.000000 | 0.602288 | 1.279297 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 29.000000 | nan | nan | nan |
| 134 | av_rech_amt_data_6 | float64 | 52431.000000 | 0.749025 | 69206.000000 | 17568.000000 | 192.831096 | 190.623115 | 1.000000 | 82.000000 | 154.000000 | 252.000000 | 5920.000000 | nan | nan | nan |
| 143 | arpu_3g_6 | float64 | 52431.000000 | 0.749025 | 64492.000000 | 17568.000000 | 90.069931 | 193.600413 | -20.380000 | 0.000000 | 0.520000 | 122.070000 | 5054.370000 | nan | nan | nan |
| 146 | arpu_2g_6 | float64 | 52431.000000 | 0.749025 | 64609.000000 | 17568.000000 | 86.863900 | 171.321203 | -35.830000 | 0.000000 | 11.300000 | 122.070000 | 5054.350000 | nan | nan | nan |
| 149 | night_pck_user_6 | float64 | 52431.000000 | 0.749025 | 69996.000000 | 17568.000000 | 0.025273 | 0.156958 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | nan | nan | nan |
| 164 | fb_user_6 | float64 | 52431.000000 | 0.749025 | 69996.000000 | 17568.000000 | 0.916325 | 0.276907 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | nan | nan | nan |
| 123 | total_rech_data_7 | float64 | 52134.000000 | 0.744782 | 69957.000000 | 17865.000000 | 2.679989 | 3.073472 | 1.000000 | 1.000000 | 2.000000 | 3.000000 | 54.000000 | nan | nan | nan |
| 126 | max_rech_data_7 | float64 | 52134.000000 | 0.744782 | 69953.000000 | 17865.000000 | 126.402071 | 109.459266 | 1.000000 | 25.000000 | 145.000000 | 177.000000 | 1555.000000 | nan | nan | nan |
| 129 | count_rech_2g_7 | float64 | 52134.000000 | 0.744782 | 69965.000000 | 17865.000000 | 2.056311 | 2.799916 | 0.000000 | 1.000000 | 1.000000 | 2.000000 | 48.000000 | nan | nan | nan |
| 132 | count_rech_3g_7 | float64 | 52134.000000 | 0.744782 | 69971.000000 | 17865.000000 | 0.623678 | 1.401230 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 34.000000 | nan | nan | nan |
| 135 | av_rech_amt_data_7 | float64 | 52134.000000 | 0.744782 | 69161.000000 | 17865.000000 | 201.455940 | 198.346141 | 1.000000 | 92.000000 | 154.000000 | 252.000000 | 4365.000000 | nan | nan | nan |
| 144 | arpu_3g_7 | float64 | 52134.000000 | 0.744782 | 64609.000000 | 17865.000000 | 89.115767 | 195.826990 | -26.040000 | 0.000000 | 0.420000 | 120.860000 | 4980.900000 | nan | nan | nan |
| 147 | arpu_2g_7 | float64 | 52134.000000 | 0.744782 | 64942.000000 | 17865.000000 | 85.846074 | 178.067280 | -13.090000 | 0.000000 | 8.800000 | 122.070000 | 4809.360000 | nan | nan | nan |
| 150 | night_pck_user_7 | float64 | 52134.000000 | 0.744782 | 69996.000000 | 17865.000000 | 0.024069 | 0.153269 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | nan | nan | nan |
| 165 | fb_user_7 | float64 | 52134.000000 | 0.744782 | 69996.000000 | 17865.000000 | 0.909544 | 0.286842 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | nan | nan | nan |
| 124 | total_rech_data_8 | float64 | 51582.000000 | 0.736896 | 69954.000000 | 18417.000000 | 2.652441 | 3.101265 | 1.000000 | 1.000000 | 1.000000 | 3.000000 | 60.000000 | nan | nan | nan |
| 127 | max_rech_data_8 | float64 | 51582.000000 | 0.736896 | 69949.000000 | 18417.000000 | 125.374925 | 109.648799 | 1.000000 | 25.000000 | 145.000000 | 179.000000 | 1555.000000 | nan | nan | nan |
| 130 | count_rech_2g_8 | float64 | 51582.000000 | 0.736896 | 69965.000000 | 18417.000000 | 2.016018 | 2.728246 | 0.000000 | 1.000000 | 1.000000 | 2.000000 | 44.000000 | nan | nan | nan |
| 133 | count_rech_3g_8 | float64 | 51582.000000 | 0.736896 | 69969.000000 | 18417.000000 | 0.636423 | 1.457058 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 45.000000 | nan | nan | nan |
| 136 | av_rech_amt_data_8 | float64 | 51582.000000 | 0.736896 | 69143.000000 | 18417.000000 | 196.815792 | 192.280532 | 1.000000 | 84.000000 | 154.000000 | 252.000000 | 4076.000000 | nan | nan | nan |
| 145 | arpu_3g_8 | float64 | 51582.000000 | 0.736896 | 64182.000000 | 18417.000000 | 90.618564 | 189.907986 | -24.490000 | 0.000000 | 0.840000 | 122.070000 | 3716.900000 | nan | nan | nan |
| 148 | arpu_2g_8 | float64 | 51582.000000 | 0.736896 | 64866.000000 | 18417.000000 | 86.348404 | 170.297094 | -55.830000 | 0.000000 | 9.090000 | 122.070000 | 3483.170000 | nan | nan | nan |
| 151 | night_pck_user_8 | float64 | 51582.000000 | 0.736896 | 69996.000000 | 18417.000000 | 0.021013 | 0.143432 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | nan | nan | nan |
| 166 | fb_user_8 | float64 | 51582.000000 | 0.736896 | 69996.000000 | 18417.000000 | 0.890319 | 0.312501 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | nan | nan | nan |
| 13 | onnet_mou_8 | float64 | 3703.000000 | 0.052901 | 50273.000000 | 66296.000000 | 132.978257 | 311.896596 | 0.000000 | 6.410000 | 32.100000 | 115.060000 | 10752.560000 | nan | nan | nan |
| 16 | offnet_mou_8 | float64 | 3703.000000 | 0.052901 | 44003.000000 | 66296.000000 | 196.543577 | 324.089234 | 0.000000 | 31.575000 | 91.800000 | 229.345000 | 14007.340000 | nan | nan | nan |
| 19 | roam_ic_mou_8 | float64 | 3703.000000 | 0.052901 | 65746.000000 | 66296.000000 | 7.004892 | 53.408135 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4169.810000 | nan | nan | nan |
| 22 | roam_og_mou_8 | float64 | 3703.000000 | 0.052901 | 64783.000000 | 66296.000000 | 9.771783 | 64.618388 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5337.040000 | nan | nan | nan |
| 25 | loc_og_t2t_mou_8 | float64 | 3703.000000 | 0.052901 | 58702.000000 | 66296.000000 | 45.686109 | 153.716880 | 0.000000 | 1.610000 | 11.740000 | 39.895000 | 10752.560000 | nan | nan | nan |
| 28 | loc_og_t2m_mou_8 | float64 | 3703.000000 | 0.052901 | 52382.000000 | 66296.000000 | 91.121447 | 152.997805 | 0.000000 | 9.830000 | 40.350000 | 109.245000 | 4961.330000 | nan | nan | nan |
| 31 | loc_og_t2f_mou_8 | float64 | 3703.000000 | 0.052901 | 66687.000000 | 66296.000000 | 3.661652 | 13.009193 | 0.000000 | 0.000000 | 0.000000 | 2.030000 | 588.290000 | nan | nan | nan |
| 34 | loc_og_t2c_mou_8 | float64 | 3703.000000 | 0.052901 | 67794.000000 | 66296.000000 | 1.420840 | 6.542202 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 351.830000 | nan | nan | nan |
| 37 | loc_og_mou_8 | float64 | 3703.000000 | 0.052901 | 48048.000000 | 66296.000000 | 140.476486 | 245.342359 | 0.000000 | 17.237500 | 63.520000 | 165.615000 | 11039.910000 | nan | nan | nan |
| 40 | std_og_t2t_mou_8 | float64 | 3703.000000 | 0.052901 | 55515.000000 | 66296.000000 | 83.471486 | 267.021929 | 0.000000 | 0.000000 | 0.000000 | 30.760000 | 8014.430000 | nan | nan | nan |
| 43 | std_og_t2m_mou_8 | float64 | 3703.000000 | 0.052901 | 54030.000000 | 66296.000000 | 90.586999 | 270.032002 | 0.000000 | 0.000000 | 3.300000 | 52.660000 | 13950.040000 | nan | nan | nan |
| 46 | std_og_t2f_mou_8 | float64 | 3703.000000 | 0.052901 | 68053.000000 | 66296.000000 | 1.057739 | 7.696853 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 354.160000 | nan | nan | nan |
| 49 | std_og_t2c_mou_8 | float64 | 3703.000000 | 0.052901 | 69997.000000 | 66296.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 52 | std_og_mou_8 | float64 | 3703.000000 | 0.052901 | 48253.000000 | 66296.000000 | 175.118852 | 410.697098 | 0.000000 | 0.000000 | 10.505000 | 149.015000 | 13980.060000 | nan | nan | nan |
| 55 | isd_og_mou_8 | float64 | 3703.000000 | 0.052901 | 68990.000000 | 66296.000000 | 0.841648 | 29.563367 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5681.540000 | nan | nan | nan |
| 58 | spl_og_mou_8 | float64 | 3703.000000 | 0.052901 | 66150.000000 | 66296.000000 | 5.045027 | 17.708507 | 0.000000 | 0.000000 | 0.000000 | 4.002500 | 1075.080000 | nan | nan | nan |
| 61 | og_others_8 | float64 | 3703.000000 | 0.052901 | 69819.000000 | 66296.000000 | 0.033059 | 2.232547 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 394.930000 | nan | nan | nan |
| 67 | loc_ic_t2t_mou_8 | float64 | 3703.000000 | 0.052901 | 58552.000000 | 66296.000000 | 47.256388 | 141.249368 | 0.000000 | 3.280000 | 16.040000 | 46.280000 | 10696.230000 | nan | nan | nan |
| 70 | loc_ic_t2m_mou_8 | float64 | 3703.000000 | 0.052901 | 50976.000000 | 66296.000000 | 108.154731 | 166.223461 | 0.000000 | 18.940000 | 58.210000 | 134.380000 | 6274.190000 | nan | nan | nan |
| 73 | loc_ic_t2f_mou_8 | float64 | 3703.000000 | 0.052901 | 63906.000000 | 66296.000000 | 11.716763 | 38.606895 | 0.000000 | 0.000000 | 0.930000 | 8.090000 | 1676.580000 | nan | nan | nan |
| 76 | loc_ic_mou_8 | float64 | 3703.000000 | 0.052901 | 45816.000000 | 66296.000000 | 167.136761 | 249.288410 | 0.000000 | 32.810000 | 93.890000 | 208.060000 | 10830.160000 | nan | nan | nan |
| 79 | std_ic_t2t_mou_8 | float64 | 3703.000000 | 0.052901 | 64631.000000 | 66296.000000 | 9.910217 | 54.248186 | 0.000000 | 0.000000 | 0.000000 | 4.052500 | 3930.240000 | nan | nan | nan |
| 82 | std_ic_t2m_mou_8 | float64 | 3703.000000 | 0.052901 | 62092.000000 | 66296.000000 | 21.089042 | 81.534344 | 0.000000 | 0.000000 | 2.030000 | 15.310000 | 5512.760000 | nan | nan | nan |
| 85 | std_ic_t2f_mou_8 | float64 | 3703.000000 | 0.052901 | 67415.000000 | 66296.000000 | 2.075179 | 15.865403 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1394.890000 | nan | nan | nan |
| 88 | std_ic_t2o_mou_8 | float64 | 3703.000000 | 0.052901 | 69997.000000 | 66296.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 91 | std_ic_mou_8 | float64 | 3703.000000 | 0.052901 | 60059.000000 | 66296.000000 | 33.077030 | 108.469864 | 0.000000 | 0.030000 | 5.830000 | 27.615000 | 5658.740000 | nan | nan | nan |
| 97 | spl_ic_mou_8 | float64 | 3703.000000 | 0.052901 | 69903.000000 | 66296.000000 | 0.040392 | 0.148417 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 16.860000 | nan | nan | nan |
| 100 | isd_ic_mou_8 | float64 | 3703.000000 | 0.052901 | 65304.000000 | 66296.000000 | 8.348424 | 63.097570 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4127.010000 | nan | nan | nan |
| 103 | ic_others_8 | float64 | 3703.000000 | 0.052901 | 68433.000000 | 66296.000000 | 0.963214 | 11.697686 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1209.860000 | nan | nan | nan |
| 11 | onnet_mou_6 | float64 | 2768.000000 | 0.039543 | 49941.000000 | 67231.000000 | 133.153275 | 299.963093 | 0.000000 | 7.410000 | 34.110000 | 119.390000 | 7376.710000 | nan | nan | nan |
| 14 | offnet_mou_6 | float64 | 2768.000000 | 0.039543 | 43744.000000 | 67231.000000 | 198.874771 | 316.818355 | 0.000000 | 34.860000 | 96.480000 | 232.990000 | 8362.360000 | nan | nan | nan |
| 17 | roam_ic_mou_6 | float64 | 2768.000000 | 0.039543 | 64692.000000 | 67231.000000 | 9.765435 | 57.374429 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2850.980000 | nan | nan | nan |
| 20 | roam_og_mou_6 | float64 | 2768.000000 | 0.039543 | 63430.000000 | 67231.000000 | 14.186457 | 73.469261 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 3775.110000 | nan | nan | nan |
| 23 | loc_og_t2t_mou_6 | float64 | 2768.000000 | 0.039543 | 58507.000000 | 67231.000000 | 46.904854 | 150.971758 | 0.000000 | 1.660000 | 11.910000 | 40.740000 | 6431.330000 | nan | nan | nan |
| 26 | loc_og_t2m_mou_6 | float64 | 2768.000000 | 0.039543 | 52150.000000 | 67231.000000 | 93.238231 | 162.046699 | 0.000000 | 9.920000 | 41.030000 | 110.430000 | 4696.830000 | nan | nan | nan |
| 29 | loc_og_t2f_mou_6 | float64 | 2768.000000 | 0.039543 | 66622.000000 | 67231.000000 | 3.743179 | 13.319542 | 0.000000 | 0.000000 | 0.000000 | 2.060000 | 617.580000 | nan | nan | nan |
| 32 | loc_og_t2c_mou_6 | float64 | 2768.000000 | 0.039543 | 68053.000000 | 67231.000000 | 1.126025 | 5.741811 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 342.860000 | nan | nan | nan |
| 35 | loc_og_mou_6 | float64 | 2768.000000 | 0.039543 | 47689.000000 | 67231.000000 | 143.893585 | 252.034597 | 0.000000 | 17.235000 | 65.190000 | 167.880000 | 10643.380000 | nan | nan | nan |
| 38 | std_og_t2t_mou_6 | float64 | 2768.000000 | 0.039543 | 55444.000000 | 67231.000000 | 80.619382 | 255.098355 | 0.000000 | 0.000000 | 0.000000 | 31.020000 | 7366.580000 | nan | nan | nan |
| 41 | std_og_t2m_mou_6 | float64 | 2768.000000 | 0.039543 | 53939.000000 | 67231.000000 | 88.152110 | 255.771554 | 0.000000 | 0.000000 | 3.980000 | 53.745000 | 8314.760000 | nan | nan | nan |
| 44 | std_og_t2f_mou_6 | float64 | 2768.000000 | 0.039543 | 67924.000000 | 67231.000000 | 1.126377 | 8.136645 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 628.560000 | nan | nan | nan |
| 47 | std_og_t2c_mou_6 | float64 | 2768.000000 | 0.039543 | 69997.000000 | 67231.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 50 | std_og_mou_6 | float64 | 2768.000000 | 0.039543 | 48161.000000 | 67231.000000 | 169.900601 | 392.046600 | 0.000000 | 0.000000 | 11.730000 | 146.335000 | 8432.990000 | nan | nan | nan |
| 53 | isd_og_mou_6 | float64 | 2768.000000 | 0.039543 | 68895.000000 | 67231.000000 | 0.845763 | 29.747486 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5900.660000 | nan | nan | nan |
| 56 | spl_og_mou_6 | float64 | 2768.000000 | 0.039543 | 66495.000000 | 67231.000000 | 3.958619 | 15.854529 | 0.000000 | 0.000000 | 0.000000 | 2.400000 | 1023.210000 | nan | nan | nan |
| 59 | og_others_6 | float64 | 2768.000000 | 0.039543 | 69084.000000 | 67231.000000 | 0.462581 | 4.768437 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 800.890000 | nan | nan | nan |
| 65 | loc_ic_t2t_mou_6 | float64 | 2768.000000 | 0.039543 | 58361.000000 | 67231.000000 | 48.043255 | 140.499757 | 0.000000 | 3.030000 | 15.740000 | 46.980000 | 5315.590000 | nan | nan | nan |
| 68 | loc_ic_t2m_mou_6 | float64 | 2768.000000 | 0.039543 | 50964.000000 | 67231.000000 | 107.152439 | 168.455999 | 0.000000 | 17.390000 | 56.460000 | 132.020000 | 4450.740000 | nan | nan | nan |
| 71 | loc_ic_t2f_mou_6 | float64 | 2768.000000 | 0.039543 | 63798.000000 | 67231.000000 | 12.050672 | 39.416076 | 0.000000 | 0.000000 | 0.880000 | 8.140000 | 1872.340000 | nan | nan | nan |
| 74 | loc_ic_mou_6 | float64 | 2768.000000 | 0.039543 | 45556.000000 | 67231.000000 | 167.255126 | 252.576231 | 0.000000 | 30.630000 | 92.430000 | 208.325000 | 7454.630000 | nan | nan | nan |
| 77 | std_ic_t2t_mou_6 | float64 | 2768.000000 | 0.039543 | 64675.000000 | 67231.000000 | 9.476958 | 51.664472 | 0.000000 | 0.000000 | 0.000000 | 4.060000 | 3336.380000 | nan | nan | nan |
| 80 | std_ic_t2m_mou_6 | float64 | 2768.000000 | 0.039543 | 62067.000000 | 67231.000000 | 20.734858 | 80.294236 | 0.000000 | 0.000000 | 2.040000 | 14.960000 | 5647.160000 | nan | nan | nan |
| 83 | std_ic_t2f_mou_6 | float64 | 2768.000000 | 0.039543 | 67374.000000 | 67231.000000 | 2.146273 | 16.522232 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1351.110000 | nan | nan | nan |
| 86 | std_ic_t2o_mou_6 | float64 | 2768.000000 | 0.039543 | 69997.000000 | 67231.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 89 | std_ic_mou_6 | float64 | 2768.000000 | 0.039543 | 60122.000000 | 67231.000000 | 32.360632 | 104.381082 | 0.000000 | 0.000000 | 5.910000 | 26.780000 | 5712.110000 | nan | nan | nan |
| 95 | spl_ic_mou_6 | float64 | 2768.000000 | 0.039543 | 69920.000000 | 67231.000000 | 0.061932 | 0.164823 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 19.760000 | nan | nan | nan |
| 98 | isd_ic_mou_6 | float64 | 2768.000000 | 0.039543 | 65577.000000 | 67231.000000 | 7.394167 | 60.951165 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 6789.410000 | nan | nan | nan |
| 101 | ic_others_6 | float64 | 2768.000000 | 0.039543 | 68475.000000 | 67231.000000 | 0.854063 | 12.149144 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1362.940000 | nan | nan | nan |
| 12 | onnet_mou_7 | float64 | 2687.000000 | 0.038386 | 49922.000000 | 67312.000000 | 133.894438 | 311.277193 | 0.000000 | 6.675000 | 32.280000 | 115.837500 | 8157.780000 | nan | nan | nan |
| 15 | offnet_mou_7 | float64 | 2687.000000 | 0.038386 | 43859.000000 | 67312.000000 | 197.153383 | 322.482226 | 0.000000 | 32.240000 | 91.885000 | 227.630000 | 7043.980000 | nan | nan | nan |
| 18 | roam_ic_mou_7 | float64 | 2687.000000 | 0.038386 | 65758.000000 | 67312.000000 | 7.014568 | 55.960985 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4155.830000 | nan | nan | nan |
| 21 | roam_og_mou_7 | float64 | 2687.000000 | 0.038386 | 64693.000000 | 67312.000000 | 9.842191 | 58.511894 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2812.040000 | nan | nan | nan |
| 24 | loc_og_t2t_mou_7 | float64 | 2687.000000 | 0.038386 | 58639.000000 | 67312.000000 | 46.166503 | 154.739002 | 0.000000 | 1.650000 | 11.580000 | 39.760000 | 7400.660000 | nan | nan | nan |
| 27 | loc_og_t2m_mou_7 | float64 | 2687.000000 | 0.038386 | 52357.000000 | 67312.000000 | 90.799240 | 153.852597 | 0.000000 | 10.090000 | 40.170000 | 107.540000 | 4557.140000 | nan | nan | nan |
| 30 | loc_og_t2f_mou_7 | float64 | 2687.000000 | 0.038386 | 66633.000000 | 67312.000000 | 3.777031 | 13.568110 | 0.000000 | 0.000000 | 0.000000 | 2.080000 | 815.330000 | nan | nan | nan |
| 33 | loc_og_t2c_mou_7 | float64 | 2687.000000 | 0.038386 | 67895.000000 | 67312.000000 | 1.361052 | 7.914113 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 916.240000 | nan | nan | nan |
| 36 | loc_og_mou_7 | float64 | 2687.000000 | 0.038386 | 47986.000000 | 67312.000000 | 140.750120 | 246.313148 | 0.000000 | 17.590000 | 63.430000 | 163.932500 | 7674.780000 | nan | nan | nan |
| 39 | std_og_t2t_mou_7 | float64 | 2687.000000 | 0.038386 | 55253.000000 | 67312.000000 | 83.775851 | 266.693254 | 0.000000 | 0.000000 | 0.000000 | 31.300000 | 8133.660000 | nan | nan | nan |
| 42 | std_og_t2m_mou_7 | float64 | 2687.000000 | 0.038386 | 53764.000000 | 67312.000000 | 91.538615 | 267.532089 | 0.000000 | 0.000000 | 3.710000 | 54.640000 | 6622.540000 | nan | nan | nan |
| 45 | std_og_t2f_mou_7 | float64 | 2687.000000 | 0.038386 | 68007.000000 | 67312.000000 | 1.084062 | 8.325206 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 465.790000 | nan | nan | nan |
| 48 | std_og_t2c_mou_7 | float64 | 2687.000000 | 0.038386 | 69997.000000 | 67312.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 51 | std_og_mou_7 | float64 | 2687.000000 | 0.038386 | 47910.000000 | 67312.000000 | 176.401217 | 409.299501 | 0.000000 | 0.000000 | 11.260000 | 151.645000 | 8155.530000 | nan | nan | nan |
| 54 | isd_og_mou_7 | float64 | 2687.000000 | 0.038386 | 68914.000000 | 67312.000000 | 0.811100 | 29.220073 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5490.280000 | nan | nan | nan |
| 57 | spl_og_mou_7 | float64 | 2687.000000 | 0.038386 | 66171.000000 | 67312.000000 | 4.976783 | 22.229842 | 0.000000 | 0.000000 | 0.000000 | 3.660000 | 2372.510000 | nan | nan | nan |
| 60 | og_others_7 | float64 | 2687.000000 | 0.038386 | 69850.000000 | 67312.000000 | 0.024425 | 1.716430 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 270.240000 | nan | nan | nan |
| 66 | loc_ic_t2t_mou_7 | float64 | 2687.000000 | 0.038386 | 58403.000000 | 67312.000000 | 47.882736 | 147.761124 | 0.000000 | 3.260000 | 15.830000 | 45.690000 | 9324.660000 | nan | nan | nan |
| 69 | loc_ic_t2m_mou_7 | float64 | 2687.000000 | 0.038386 | 51053.000000 | 67312.000000 | 106.489856 | 165.452459 | 0.000000 | 18.610000 | 56.930000 | 131.010000 | 4455.830000 | nan | nan | nan |
| 72 | loc_ic_t2f_mou_7 | float64 | 2687.000000 | 0.038386 | 63711.000000 | 67312.000000 | 12.563665 | 43.495179 | 0.000000 | 0.000000 | 0.910000 | 8.230000 | 1983.010000 | nan | nan | nan |
| 75 | loc_ic_mou_7 | float64 | 2687.000000 | 0.038386 | 45740.000000 | 67312.000000 | 166.945103 | 254.688718 | 0.000000 | 32.710000 | 92.510000 | 205.530000 | 9669.910000 | nan | nan | nan |
| 78 | std_ic_t2t_mou_7 | float64 | 2687.000000 | 0.038386 | 64519.000000 | 67312.000000 | 9.873468 | 56.137824 | 0.000000 | 0.000000 | 0.000000 | 4.180000 | 4708.710000 | nan | nan | nan |
| 81 | std_ic_t2m_mou_7 | float64 | 2687.000000 | 0.038386 | 61965.000000 | 67312.000000 | 21.685359 | 87.314510 | 0.000000 | 0.000000 | 2.060000 | 15.830000 | 6141.880000 | nan | nan | nan |
| 84 | std_ic_t2f_mou_7 | float64 | 2687.000000 | 0.038386 | 67303.000000 | 67312.000000 | 2.199395 | 16.171533 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1136.080000 | nan | nan | nan |
| 87 | std_ic_t2o_mou_7 | float64 | 2687.000000 | 0.038386 | 69997.000000 | 67312.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 90 | std_ic_mou_7 | float64 | 2687.000000 | 0.038386 | 59886.000000 | 67312.000000 | 33.760809 | 114.142230 | 0.000000 | 0.000000 | 5.980000 | 28.160000 | 6745.760000 | nan | nan | nan |
| 96 | spl_ic_mou_7 | float64 | 2687.000000 | 0.038386 | 69895.000000 | 67312.000000 | 0.033371 | 0.137322 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 13.460000 | nan | nan | nan |
| 99 | isd_ic_mou_7 | float64 | 2687.000000 | 0.038386 | 65358.000000 | 67312.000000 | 8.171162 | 63.604165 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5289.540000 | nan | nan | nan |
| 102 | ic_others_7 | float64 | 2687.000000 | 0.038386 | 68340.000000 | 67312.000000 | 1.019680 | 13.225373 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1495.940000 | nan | nan | nan |
| 2 | loc_og_t2o_mou | float64 | 702.000000 | 0.010029 | 69997.000000 | 69297.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 3 | std_og_t2o_mou | float64 | 702.000000 | 0.010029 | 69997.000000 | 69297.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
| 4 | loc_ic_t2o_mou | float64 | 702.000000 | 0.010029 | 69997.000000 | 69297.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan | nan | nan |
However, the columns with null count are not dropped from the dataset as it could be a pattern on a time series. Impute with value 0 if appropriate.
train_df.std().values
array([2.02071151e+04, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 3.34213918e+02, 3.44366927e+02, 3.51924315e+02,
2.99963093e+02, 3.11277193e+02, 3.11896596e+02, 3.16818355e+02,
3.22482226e+02, 3.24089234e+02, 5.73744292e+01, 5.59609854e+01,
5.34081346e+01, 7.34692609e+01, 5.85118935e+01, 6.46183885e+01,
1.50971758e+02, 1.54739002e+02, 1.53716880e+02, 1.62046699e+02,
1.53852597e+02, 1.52997805e+02, 1.33195423e+01, 1.35681104e+01,
1.30091929e+01, 5.74181127e+00, 7.91411335e+00, 6.54220214e+00,
2.52034597e+02, 2.46313148e+02, 2.45342359e+02, 2.55098355e+02,
2.66693254e+02, 2.67021929e+02, 2.55771554e+02, 2.67532089e+02,
2.70032002e+02, 8.13664503e+00, 8.32520612e+00, 7.69685270e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 3.92046600e+02,
4.09299501e+02, 4.10697098e+02, 2.97474857e+01, 2.92200735e+01,
2.95633665e+01, 1.58545287e+01, 2.22298416e+01, 1.77085071e+01,
4.76843713e+00, 1.71642999e+00, 2.23254661e+00, 4.65502866e+02,
4.79131770e+02, 4.77936832e+02, 1.40499757e+02, 1.47761124e+02,
1.41249368e+02, 1.68455999e+02, 1.65452459e+02, 1.66223461e+02,
3.94160763e+01, 4.34951785e+01, 3.86068949e+01, 2.52576231e+02,
2.54688718e+02, 2.49288410e+02, 5.16644725e+01, 5.61378242e+01,
5.42481864e+01, 8.02942362e+01, 8.73145097e+01, 8.15343444e+01,
1.65222318e+01, 1.61715334e+01, 1.58654029e+01, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 1.04381082e+02, 1.14142230e+02,
1.08469864e+02, 2.90114823e+02, 2.96771338e+02, 2.88336731e+02,
1.64822724e-01, 1.37321612e-01, 1.48416944e-01, 6.09511652e+01,
6.36041649e+01, 6.30975695e+01, 1.21491437e+01, 1.32253726e+01,
1.16976858e+01, 7.04145191e+00, 7.05061438e+00, 7.19559687e+00,
4.04211068e+02, 4.11070120e+02, 4.26181405e+02, 1.21407701e+02,
1.20782543e+02, 1.24396750e+02, 9.79548755e+01, 9.54294919e+01,
1.01996729e+02, 2.79461023e+00, 3.07347237e+00, 3.10126541e+00,
1.09352573e+02, 1.09459266e+02, 1.09648799e+02, 2.56637681e+00,
2.79991594e+00, 2.72824579e+00, 1.27929748e+00, 1.40122987e+00,
1.45705777e+00, 1.90623115e+02, 1.98346141e+02, 1.92280532e+02,
2.12513909e+02, 2.11114667e+02, 2.13101403e+02, 5.54869965e+02,
5.54096072e+02, 5.68310234e+02, 1.93600413e+02, 1.95826990e+02,
1.89907986e+02, 1.71321203e+02, 1.78067280e+02, 1.70297094e+02,
1.56958244e-01, 1.53269005e-01, 1.43432064e-01, 2.94718551e-01,
3.04801801e-01, 2.99254125e-01, 1.49420606e+00, 1.65101216e+00,
1.63245000e+00, 3.58905190e-01, 3.83188996e-01, 3.81820764e-01,
5.73003275e-01, 6.34547003e-01, 6.80035410e-01, 2.76907467e-01,
2.86842258e-01, 3.12500550e-01, 9.52426321e+02, 2.69328659e+02,
2.67899034e+02, 2.57226810e+02, 3.02502036e-01])
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
id_cols = ["id",'circle_id']
date_cols = ['last_date_of_month_6',
'last_date_of_month_7',
'last_date_of_month_8',
'date_of_last_rech_6',
'date_of_last_rech_7',
'date_of_last_rech_8',
'date_of_last_rech_data_6',
'date_of_last_rech_data_7',
'date_of_last_rech_data_8']
cat_cols = ['night_pck_user_6',
'night_pck_user_7',
'night_pck_user_8',
'fb_user_6',
'fb_user_7',
'fb_user_8'
]
len_rows, len_col = train_df.shape
num_cols = [column for column in train_df.columns if column not in id_cols + date_cols + cat_cols]
# print the number of columns in each list
print("#ID cols: %d\n#Date cols:%d\n#Numeric cols:%d\n#Category cols:%d" % (len(id_cols), len(date_cols), len(num_cols), len(cat_cols)))
# check if we have missed any column or not
assert len(id_cols) + len(date_cols) + len(num_cols) + len(cat_cols) == len_col
#ID cols: 2 #Date cols:9 #Numeric cols:155 #Category cols:6
plt.figure(figsize=(20, 4))
sb.set_style("darkgrid")
sb.set_palette(palette = 'colorblind', n_colors = 10, desat = 0.8)
current_palette = sb.color_palette()
plt.subplot(1,3,1)
sns.distplot(train_df['arpu_6'])
plt.subplot(1,3,2)
sns.distplot(train_df['arpu_7'])
plt.subplot(1,3,3)
sns.distplot(train_df['arpu_8']);
The above plots depicts that the distribution of data is left skewed. Which is an indication that very less proportion of users have spent high average revenue while more users spent low average revenue.
plt.figure(figsize=(20, 12))
sb.set_style("darkgrid")
sb.set_palette(palette = 'colorblind', n_colors = 10, desat = 0.8)
current_palette = sb.color_palette()
plt.subplot(2,3,1)
sns.countplot(x = 'total_rech_data_6', data = train_df)
plt.subplot(2,3,2)
sns.countplot(x = 'total_rech_data_7', data = train_df)
plt.subplot(2,3,3)
sns.countplot(x = 'total_rech_data_8', data = train_df)
plt.subplot(2,3,4)
sns.boxplot(x = 'max_rech_data_6', data = train_df)
plt.subplot(2,3,5)
sns.boxplot(x = 'max_rech_data_6', data = train_df)
plt.subplot(2,3,6)
sns.boxplot(x = 'max_rech_data_6', data = train_df)
plt.show();
The above plots depict that the Total reach data of month 6, 7 and 8 are right skewed and Max_reach_data of every month is having outliers
Take a backup of the data prior to data manipulation:
telchurn = train_df.copy(deep=True)
telchurn.isnull().sum()*100/telchurn.shape[0]
id 0.000000
circle_id 0.000000
loc_og_t2o_mou 1.002871
std_og_t2o_mou 1.002871
loc_ic_t2o_mou 1.002871
...
aon 0.000000
aug_vbc_3g 0.000000
jul_vbc_3g 0.000000
jun_vbc_3g 0.000000
churn_probability 0.000000
Length: 172, dtype: float64
Check for standard deviation to understand the variance:
# check for standard deviation
telchurn.std()
id 20207.115084
circle_id 0.000000
loc_og_t2o_mou 0.000000
std_og_t2o_mou 0.000000
loc_ic_t2o_mou 0.000000
...
aon 952.426321
aug_vbc_3g 269.328659
jul_vbc_3g 267.899034
jun_vbc_3g 257.226810
churn_probability 0.302502
Length: 163, dtype: float64
# Imputing with zeroes
recharge_cols = ['total_rech_data_6', 'total_rech_data_7', 'total_rech_data_8',
'count_rech_2g_6', 'count_rech_2g_7', 'count_rech_2g_8',
'count_rech_3g_6', 'count_rech_3g_7', 'count_rech_3g_8',
'max_rech_data_6', 'max_rech_data_7', 'max_rech_data_8',
'av_rech_amt_data_6', 'av_rech_amt_data_7', 'av_rech_amt_data_8',
]
telchurn[recharge_cols].describe(include='all')
| total_rech_data_6 | total_rech_data_7 | total_rech_data_8 | count_rech_2g_6 | count_rech_2g_7 | count_rech_2g_8 | count_rech_3g_6 | count_rech_3g_7 | count_rech_3g_8 | max_rech_data_6 | max_rech_data_7 | max_rech_data_8 | av_rech_amt_data_6 | av_rech_amt_data_7 | av_rech_amt_data_8 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 17568.000000 | 17865.000000 | 18417.000000 | 17568.000000 | 17865.000000 | 18417.000000 | 17568.000000 | 17865.000000 | 18417.000000 | 17568.000000 | 17865.000000 | 18417.000000 | 17568.000000 | 17865.000000 | 18417.000000 |
| mean | 2.467612 | 2.679989 | 2.652441 | 1.865323 | 2.056311 | 2.016018 | 0.602288 | 0.623678 | 0.636423 | 126.500000 | 126.402071 | 125.374925 | 192.831096 | 201.455940 | 196.815792 |
| std | 2.794610 | 3.073472 | 3.101265 | 2.566377 | 2.799916 | 2.728246 | 1.279297 | 1.401230 | 1.457058 | 109.352573 | 109.459266 | 109.648799 | 190.623115 | 198.346141 | 192.280532 |
| min | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| 25% | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 25.000000 | 25.000000 | 25.000000 | 82.000000 | 92.000000 | 84.000000 |
| 50% | 1.000000 | 2.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 145.000000 | 145.000000 | 145.000000 | 154.000000 | 154.000000 | 154.000000 |
| 75% | 3.000000 | 3.000000 | 3.000000 | 2.000000 | 2.000000 | 2.000000 | 1.000000 | 1.000000 | 1.000000 | 177.000000 | 177.000000 | 179.000000 | 252.000000 | 252.000000 | 252.000000 |
| max | 61.000000 | 54.000000 | 60.000000 | 42.000000 | 48.000000 | 44.000000 | 29.000000 | 34.000000 | 45.000000 | 1555.000000 | 1555.000000 | 1555.000000 | 5920.000000 | 4365.000000 | 4076.000000 |
Explore total_rech_data_6,date_of_last_rech_data_6 columns:
telchurn.loc[telchurn.total_rech_data_6.isnull() & telchurn.date_of_last_rech_data_6.isnull(),
["total_rech_data_6", "date_of_last_rech_data_6"]].head(10)
| total_rech_data_6 | date_of_last_rech_data_6 | |
|---|---|---|
| 0 | NaN | NaN |
| 1 | NaN | NaN |
| 2 | NaN | NaN |
| 3 | NaN | NaN |
| 5 | NaN | NaN |
| 6 | NaN | NaN |
| 8 | NaN | NaN |
| 9 | NaN | NaN |
| 11 | NaN | NaN |
| 12 | NaN | NaN |
Suppose a customer is not reacharging on a particular date of a month , then total_rech_data_ofmonth , date_of_last_rech_data_of month values remains empty
zero_impute = ['total_rech_data_6', 'total_rech_data_7', 'total_rech_data_8',
'av_rech_amt_data_6', 'av_rech_amt_data_7', 'av_rech_amt_data_8',
'max_rech_data_6', 'max_rech_data_7', 'max_rech_data_8']
telchurn[zero_impute] = telchurn[zero_impute].apply(lambda x: x.fillna(0))
Validate the data imputation:
# Validate whether missing values are imputed correctly or not
print("Missing value ratio:\n")
print(telchurn[zero_impute].isnull().sum()*100/telchurn.shape[1])
Missing value ratio: total_rech_data_6 0.0 total_rech_data_7 0.0 total_rech_data_8 0.0 av_rech_amt_data_6 0.0 av_rech_amt_data_7 0.0 av_rech_amt_data_8 0.0 max_rech_data_6 0.0 max_rech_data_7 0.0 max_rech_data_8 0.0 dtype: float64
# summary
print("\n\nSummary statistics:\n"+ '-'*20)
print(telchurn[zero_impute].describe(include='all'))
Summary statistics:
--------------------
total_rech_data_6 total_rech_data_7 total_rech_data_8 \
count 69999.000000 69999.000000 69999.000000
mean 0.619309 0.683981 0.697867
std 1.762010 1.943193 1.973434
min 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000
50% 0.000000 0.000000 0.000000
75% 1.000000 1.000000 1.000000
max 61.000000 54.000000 60.000000
av_rech_amt_data_6 av_rech_amt_data_7 av_rech_amt_data_8 \
count 69999.000000 69999.000000 69999.000000
mean 48.395787 51.415168 51.782975
std 126.923180 133.246541 131.291162
min 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000
50% 0.000000 0.000000 0.000000
75% 8.000000 17.000000 23.000000
max 5920.000000 4365.000000 4076.000000
max_rech_data_6 max_rech_data_7 max_rech_data_8
count 69999.000000 69999.000000 69999.000000
mean 31.748339 32.260075 32.986614
std 77.519536 78.069163 78.808337
min 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000
50% 0.000000 0.000000 0.000000
75% 8.000000 14.000000 17.000000
max 1555.000000 1555.000000 1555.000000
print("Shape before dropping: ", telchurn.shape)
telchurn = telchurn.drop(id_cols + date_cols, axis=1)
print("Shape after dropping: ", telchurn.shape)
Shape before dropping: (69999, 172) Shape after dropping: (69999, 161)
telchurn[cat_cols] = telchurn[cat_cols].apply(lambda x: x.fillna(-1))
print("Missing value ratio:\n")
print(telchurn[cat_cols].isnull().sum()*100/telchurn.shape[0])
Missing value ratio: night_pck_user_6 0.0 night_pck_user_7 0.0 night_pck_user_8 0.0 fb_user_6 0.0 fb_user_7 0.0 fb_user_8 0.0 dtype: float64
initial_cols = telchurn.shape[1]
MISSING_THRESHOLD = 0.7
include_cols = list(telchurn.apply(lambda column: True if column.isnull().sum()/telchurn.shape[0] < MISSING_THRESHOLD else False))
drop_missing = pd.DataFrame({'features':telchurn.columns , 'include': include_cols})
drop_missing.loc[drop_missing.include == True,:]
| features | include | |
|---|---|---|
| 0 | loc_og_t2o_mou | True |
| 1 | std_og_t2o_mou | True |
| 2 | loc_ic_t2o_mou | True |
| 3 | arpu_6 | True |
| 4 | arpu_7 | True |
| ... | ... | ... |
| 156 | aon | True |
| 157 | aug_vbc_3g | True |
| 158 | jul_vbc_3g | True |
| 159 | jun_vbc_3g | True |
| 160 | churn_probability | True |
149 rows × 2 columns
Check for the count of variables dropped from the analysis:
telchurn = telchurn.loc[:, include_cols]
dropped_cols = telchurn.shape[1] - initial_cols
print("{0} columns dropped.".format(dropped_cols))## iv) imputing using MICE
-12 columns dropped.
Iterate the process:
telchurn.isna().any()
loc_og_t2o_mou True
std_og_t2o_mou True
loc_ic_t2o_mou True
arpu_6 False
arpu_7 False
...
aon False
aug_vbc_3g False
jul_vbc_3g False
jun_vbc_3g False
churn_probability False
Length: 149, dtype: bool
telchurn.std()
loc_og_t2o_mou 0.000000
std_og_t2o_mou 0.000000
loc_ic_t2o_mou 0.000000
arpu_6 334.213918
arpu_7 344.366927
...
aon 952.426321
aug_vbc_3g 269.328659
jul_vbc_3g 267.899034
jun_vbc_3g 257.226810
churn_probability 0.302502
Length: 149, dtype: float64
same_val_col = ["loc_og_t2o_mou","std_og_t2o_mou","loc_ic_t2o_mou"]
telchurn = telchurn.drop(same_val_col, axis=1)
telchurn.isna().any()
arpu_6 False
arpu_7 False
arpu_8 False
onnet_mou_6 True
onnet_mou_7 True
...
aon False
aug_vbc_3g False
jul_vbc_3g False
jun_vbc_3g False
churn_probability False
Length: 146, dtype: bool
nan_list = telchurn.columns[telchurn.isna().any()].tolist()
for i in nan_list:
telchurn[i] = telchurn[i].fillna(0)
telchurn.columns[telchurn.isna().any()].tolist()
[]
print(telchurn.isnull().sum()*100/telchurn.shape[0])
arpu_6 0.0
arpu_7 0.0
arpu_8 0.0
onnet_mou_6 0.0
onnet_mou_7 0.0
...
aon 0.0
aug_vbc_3g 0.0
jul_vbc_3g 0.0
jun_vbc_3g 0.0
churn_probability 0.0
Length: 146, dtype: float64
Categorize the customers into High value customers and generic based on the revenue/value generation. Calculation based on the past data which is for the month of june and july.
Formula to calculate total data recharge amount = number of recharges * avg recharge amount
telchurn['total_data_rech_6'] = telchurn.total_rech_data_6 * telchurn.av_rech_amt_data_6
telchurn['total_data_rech_7'] = telchurn.total_rech_data_7 * telchurn.av_rech_amt_data_7
To find the total amount spent by the customer for recharge for a particular month = total data recharge + total recharge
# calculate total recharge amount for June and July --> call recharge amount + data recharge amount
telchurn['amt_data_6'] = telchurn.total_rech_amt_6 + telchurn.total_data_rech_6
telchurn['amt_data_7'] = telchurn.total_rech_amt_7 + telchurn.total_data_rech_7
# calculate average recharge done by customer in June and July
telchurn['av_amt_data_6_7'] = (telchurn.amt_data_6 + telchurn.amt_data_7)/2
Here high value customers are treated as those who spent greater than the 70 percentile, choise can be 80 percentile also
# look at the 70th percentile recharge amount
print("Recharge amount at 70th percentile: {0}".format(telchurn.av_amt_data_6_7.quantile(0.7)))
Recharge amount at 70th percentile: 477.5
# retain only those customers who have recharged their mobiles with more than or equal to 70th percentile amount
telchurn_churn_filtered = telchurn.loc[telchurn.av_amt_data_6_7 >= telchurn.av_amt_data_6_7.quantile(0.7), :]
telchurn_churn_filtered = telchurn_churn_filtered.reset_index(drop=True)
telchurn_churn_filtered.shape
(21013, 151)
# delete variables created to filter high-value customers
telchurn_churn_filtered = telchurn_churn_filtered.drop(['total_data_rech_6', 'total_data_rech_7',
'amt_data_6', 'amt_data_7', 'av_amt_data_6_7'], axis=1)
telchurn_churn_filtered.shape
(21013, 146)
21013 rows reamins after selecting the customers who have provided recharge value of more than or equal to the recharge value of the 70th percentile customer.
telchurn_churn_filtered['arpu_diff'] = telchurn_churn_filtered.arpu_8 - ((telchurn_churn_filtered.arpu_6 + telchurn_churn_filtered.arpu_7)/2)
telchurn_churn_filtered['onnet_mou_diff'] = telchurn_churn_filtered.onnet_mou_8 - ((telchurn_churn_filtered.onnet_mou_6 + telchurn_churn_filtered.onnet_mou_7)/2)
telchurn_churn_filtered['offnet_mou_diff'] = telchurn_churn_filtered.offnet_mou_8 - ((telchurn_churn_filtered.offnet_mou_6 + telchurn_churn_filtered.offnet_mou_7)/2)
telchurn_churn_filtered['roam_ic_mou_diff'] = telchurn_churn_filtered.roam_ic_mou_8 - ((telchurn_churn_filtered.roam_ic_mou_6 + telchurn_churn_filtered.roam_ic_mou_7)/2)
telchurn_churn_filtered['roam_og_mou_diff'] = telchurn_churn_filtered.roam_og_mou_8 - ((telchurn_churn_filtered.roam_og_mou_6 + telchurn_churn_filtered.roam_og_mou_7)/2)
telchurn_churn_filtered['loc_og_mou_diff'] = telchurn_churn_filtered.loc_og_mou_8 - ((telchurn_churn_filtered.loc_og_mou_6 + telchurn_churn_filtered.loc_og_mou_7)/2)
telchurn_churn_filtered['std_og_mou_diff'] = telchurn_churn_filtered.std_og_mou_8 - ((telchurn_churn_filtered.std_og_mou_6 + telchurn_churn_filtered.std_og_mou_7)/2)
telchurn_churn_filtered['isd_og_mou_diff'] = telchurn_churn_filtered.isd_og_mou_8 - ((telchurn_churn_filtered.isd_og_mou_6 + telchurn_churn_filtered.isd_og_mou_7)/2)
telchurn_churn_filtered['spl_og_mou_diff'] = telchurn_churn_filtered.spl_og_mou_8 - ((telchurn_churn_filtered.spl_og_mou_6 + telchurn_churn_filtered.spl_og_mou_7)/2)
telchurn_churn_filtered['total_og_mou_diff'] = telchurn_churn_filtered.total_og_mou_8 - ((telchurn_churn_filtered.total_og_mou_6 + telchurn_churn_filtered.total_og_mou_7)/2)
telchurn_churn_filtered['loc_ic_mou_diff'] = telchurn_churn_filtered.loc_ic_mou_8 - ((telchurn_churn_filtered.loc_ic_mou_6 + telchurn_churn_filtered.loc_ic_mou_7)/2)
telchurn_churn_filtered['std_ic_mou_diff'] = telchurn_churn_filtered.std_ic_mou_8 - ((telchurn_churn_filtered.std_ic_mou_6 + telchurn_churn_filtered.std_ic_mou_7)/2)
telchurn_churn_filtered['isd_ic_mou_diff'] = telchurn_churn_filtered.isd_ic_mou_8 - ((telchurn_churn_filtered.isd_ic_mou_6 + telchurn_churn_filtered.isd_ic_mou_7)/2)
telchurn_churn_filtered['spl_ic_mou_diff'] = telchurn_churn_filtered.spl_ic_mou_8 - ((telchurn_churn_filtered.spl_ic_mou_6 + telchurn_churn_filtered.spl_ic_mou_7)/2)
telchurn_churn_filtered['total_ic_mou_diff'] = telchurn_churn_filtered.total_ic_mou_8 - ((telchurn_churn_filtered.total_ic_mou_6 + telchurn_churn_filtered.total_ic_mou_7)/2)
telchurn_churn_filtered['total_rech_num_diff'] = telchurn_churn_filtered.total_rech_num_8 - ((telchurn_churn_filtered.total_rech_num_6 + telchurn_churn_filtered.total_rech_num_7)/2)
telchurn_churn_filtered['total_rech_amt_diff'] = telchurn_churn_filtered.total_rech_amt_8 - ((telchurn_churn_filtered.total_rech_amt_6 + telchurn_churn_filtered.total_rech_amt_7)/2)
telchurn_churn_filtered['max_rech_amt_diff'] = telchurn_churn_filtered.max_rech_amt_8 - ((telchurn_churn_filtered.max_rech_amt_6 + telchurn_churn_filtered.max_rech_amt_7)/2)
telchurn_churn_filtered['total_rech_data_diff'] = telchurn_churn_filtered.total_rech_data_8 - ((telchurn_churn_filtered.total_rech_data_6 + telchurn_churn_filtered.total_rech_data_7)/2)
telchurn_churn_filtered['max_rech_data_diff'] = telchurn_churn_filtered.max_rech_data_8 - ((telchurn_churn_filtered.max_rech_data_6 + telchurn_churn_filtered.max_rech_data_7)/2)
telchurn_churn_filtered['av_rech_amt_data_diff'] = telchurn_churn_filtered.av_rech_amt_data_8 - ((telchurn_churn_filtered.av_rech_amt_data_6 + telchurn_churn_filtered.av_rech_amt_data_7)/2)
telchurn_churn_filtered['vol_2g_mb_diff'] = telchurn_churn_filtered.vol_2g_mb_8 - ((telchurn_churn_filtered.vol_2g_mb_6 + telchurn_churn_filtered.vol_2g_mb_7)/2)
telchurn_churn_filtered['vol_3g_mb_diff'] = telchurn_churn_filtered.vol_3g_mb_8 - ((telchurn_churn_filtered.vol_3g_mb_6 + telchurn_churn_filtered.vol_3g_mb_7)/2)
# summary of one of the difference variables
telchurn_churn_filtered['max_rech_amt_diff'].describe()
count 21013.000000 mean -10.423048 std 132.730444 min -1901.000000 25% -33.000000 50% 0.000000 75% 25.000000 max 3949.000000 Name: max_rech_amt_diff, dtype: float64
cat_cols
['night_pck_user_6', 'night_pck_user_7', 'night_pck_user_8', 'fb_user_6', 'fb_user_7', 'fb_user_8']
num_cols = [column for column in telchurn_churn_filtered.columns if column not in cat_cols]
len(num_cols)
163
def data_type(variable):
if variable.dtype == np.int64 or variable.dtype == np.float64:
return 'numerical'
elif variable.dtype == 'category':
return 'categorical'
def univariate(variable, stats=True):
if data_type(variable) == 'numerical':
sns.distplot(variable)
if stats == True:
print(variable.describe())
elif data_type(variable) == 'categorical':
sns.countplot(variable)
if stats == True:
print(variable.value_counts())
else:
print("Invalid variable passed: either pass a numeric variable or a categorical vairable.")
univariate(telchurn.arpu_6)
count 69999.000000 mean 283.134365 std 334.213918 min -2258.709000 25% 93.581000 50% 197.484000 75% 370.791000 max 27731.088000 Name: arpu_6, dtype: float64
univariate(telchurn.arpu_7)
count 69999.000000 mean 278.185912 std 344.366927 min -1289.715000 25% 86.714000 50% 191.588000 75% 365.369500 max 35145.834000 Name: arpu_7, dtype: float64
univariate(telchurn.arpu_8)
count 69999.000000 mean 278.858826 std 351.924315 min -945.808000 25% 84.095000 50% 192.234000 75% 369.909000 max 33543.624000 Name: arpu_8, dtype: float64
univariate(telchurn.onnet_mou_8)
count 69999.000000 mean 125.943607 std 304.990552 min 0.000000 25% 4.110000 50% 27.840000 75% 106.280000 max 10752.560000 Name: onnet_mou_8, dtype: float64
def bivariate(var1, var2):
if data_type(var1) == 'numerical' and data_type(var2) == 'numerical':
sns.regplot(var1, var2)
elif (data_type(var1) == 'categorical' and data_type(var2) == 'numerical') or (data_type(var1) == 'numerical' and data_type(var2) == 'categorical'):
sns.boxplot(var1, var2)
telchurn_churn_filtered.head()
| arpu_6 | arpu_7 | arpu_8 | onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | ... | spl_ic_mou_diff | total_ic_mou_diff | total_rech_num_diff | total_rech_amt_diff | max_rech_amt_diff | total_rech_data_diff | max_rech_data_diff | av_rech_amt_data_diff | vol_2g_mb_diff | vol_3g_mb_diff | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 240.708 | 128.191 | 101.565 | 21.28 | 4.83 | 6.13 | 56.99 | 38.11 | 9.63 | 53.64 | ... | 0.000 | 12.860 | -3.5 | -91.0 | -15.5 | -1.0 | -8.0 | -41.0 | -136.375 | 0.00 |
| 1 | 580.549 | 377.294 | 338.286 | 10.43 | 24.99 | 194.43 | 317.11 | 341.79 | 197.09 | 0.00 | ... | 0.000 | 124.655 | 0.0 | 125.0 | 0.0 | 0.5 | 77.0 | 77.0 | 224.140 | 0.00 |
| 2 | 1130.948 | 905.506 | 479.762 | 859.53 | 372.71 | 0.89 | 293.46 | 450.11 | 386.91 | 0.00 | ... | -0.065 | -320.160 | -12.5 | -711.0 | 34.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0.00 |
| 3 | 371.974 | 352.069 | 240.449 | 93.18 | 38.29 | 29.86 | 531.19 | 507.76 | 222.03 | 0.00 | ... | 0.000 | 0.080 | 3.0 | -193.5 | -80.0 | -0.5 | 0.0 | -11.5 | -0.020 | 0.00 |
| 4 | 390.489 | 350.985 | 421.300 | 8.58 | 11.48 | 12.68 | 133.16 | 158.73 | 182.83 | 0.00 | ... | 0.000 | 26.105 | -6.5 | -318.5 | -13.0 | 0.0 | 11.0 | -40.5 | -38.920 | 1017.89 |
5 rows × 169 columns
bivariate(telchurn_churn_filtered.arpu_6, telchurn_churn_filtered.onnet_mou_6)
bivariate(telchurn_churn_filtered.max_rech_data_diff, telchurn_churn_filtered.vol_3g_mb_diff)
pd.crosstab(telchurn_churn_filtered.night_pck_user_8, telchurn_churn_filtered.sachet_3g_8)
| sachet_3g_8 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 18 | 19 | 20 | 21 | 23 | 25 | 29 | 30 | 38 | 41 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| night_pck_user_8 | |||||||||||||||||||||
| -1.0 | 9833 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.0 | 9219 | 1117 | 264 | 118 | 60 | 48 | 25 | 19 | 17 | 10 | ... | 3 | 1 | 3 | 1 | 1 | 0 | 1 | 0 | 1 | 1 |
| 1.0 | 84 | 60 | 18 | 20 | 9 | 14 | 4 | 5 | 5 | 4 | ... | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
3 rows × 28 columns
def cap_outliers(array, k=3):
upper_limit = array.mean() + k*array.std()
lower_limit = array.mean() - k*array.std()
array[array<lower_limit] = lower_limit
array[array>upper_limit] = upper_limit
return array
sample_array = list(range(100))
# add outliers to the data
sample_array[0] = -9999
sample_array[99] = 9999
# cap outliers
sample_array = np.array(sample_array)
print("Array after capping outliers: \n", cap_outliers(sample_array, k=2))
Array after capping outliers:
[-2780 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 2877]
# cap outliers in the numeric columns
telchurn_churn_filtered[num_cols] = telchurn_churn_filtered[num_cols].apply(cap_outliers, axis=0)
telchurn_numeric = telchurn_churn_filtered[num_cols]
telchurn_numeric
| arpu_6 | arpu_7 | arpu_8 | onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | ... | spl_ic_mou_diff | total_ic_mou_diff | total_rech_num_diff | total_rech_amt_diff | max_rech_amt_diff | total_rech_data_diff | max_rech_data_diff | av_rech_amt_data_diff | vol_2g_mb_diff | vol_3g_mb_diff | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 240.708 | 128.191 | 101.565 | 21.28 | 4.83 | 6.13 | 56.99 | 38.11 | 9.63 | 53.64 | ... | 0.000 | 12.860 | -3.5 | -91.0 | -15.500000 | -1.0 | -8.0 | -41.0 | -136.375 | 0.000 |
| 1 | 580.549 | 377.294 | 338.286 | 10.43 | 24.99 | 194.43 | 317.11 | 341.79 | 197.09 | 0.00 | ... | 0.000 | 124.655 | 0.0 | 125.0 | 0.000000 | 0.5 | 77.0 | 77.0 | 224.140 | 0.000 |
| 2 | 1130.948 | 905.506 | 479.762 | 859.53 | 372.71 | 0.89 | 293.46 | 450.11 | 386.91 | 0.00 | ... | -0.065 | -320.160 | -12.5 | -711.0 | 34.000000 | 0.0 | 0.0 | 0.0 | 0.000 | 0.000 |
| 3 | 371.974 | 352.069 | 240.449 | 93.18 | 38.29 | 29.86 | 531.19 | 507.76 | 222.03 | 0.00 | ... | 0.000 | 0.080 | 3.0 | -193.5 | -80.000000 | -0.5 | 0.0 | -11.5 | -0.020 | 0.000 |
| 4 | 390.489 | 350.985 | 421.300 | 8.58 | 11.48 | 12.68 | 133.16 | 158.73 | 182.83 | 0.00 | ... | 0.000 | 26.105 | -6.5 | -318.5 | -13.000000 | 0.0 | 11.0 | -40.5 | -38.920 | 1017.890 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 21008 | 1106.130 | 1609.381 | 1686.392 | 84.91 | 133.44 | 178.08 | 823.86 | 645.89 | 703.19 | 0.00 | ... | 0.000 | 9.810 | -1.0 | 552.0 | 387.768284 | -1.5 | -76.0 | -224.5 | -0.360 | -422.745 |
| 21009 | 196.695 | 298.645 | 192.440 | 1.05 | 0.00 | 5.83 | 44.74 | 32.16 | 30.93 | 0.00 | ... | -0.095 | -4.695 | -2.0 | -111.0 | -14.000000 | -1.5 | 0.0 | -74.0 | -0.180 | -127.775 |
| 21010 | 235.524 | 195.633 | 129.598 | 45.08 | 94.31 | 8.34 | 111.51 | 236.61 | 91.54 | 0.00 | ... | 0.000 | -97.955 | -4.0 | -91.5 | -0.500000 | -5.5 | -87.0 | -180.5 | -1.940 | -312.295 |
| 21011 | 15.760 | 410.924 | 329.136 | 0.00 | 7.36 | 10.93 | 0.00 | 488.46 | 381.64 | 14.96 | ... | 0.000 | 158.565 | 4.0 | 288.5 | 50.000000 | -2.5 | 4.5 | -50.5 | 22.015 | 0.000 |
| 21012 | 238.575 | 245.414 | 145.062 | 14.01 | 7.64 | 6.71 | 30.34 | 16.68 | 12.56 | 25.06 | ... | 0.000 | 0.225 | -2.0 | -170.5 | -81.000000 | 0.0 | -70.5 | -70.5 | -7.290 | -855.635 |
21013 rows × 163 columns
# Correlation matrix
cor = telchurn_numeric.corr()
cor
| arpu_6 | arpu_7 | arpu_8 | onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | ... | spl_ic_mou_diff | total_ic_mou_diff | total_rech_num_diff | total_rech_amt_diff | max_rech_amt_diff | total_rech_data_diff | max_rech_data_diff | av_rech_amt_data_diff | vol_2g_mb_diff | vol_3g_mb_diff | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| arpu_6 | 1.000000 | 0.586362 | 0.501452 | 0.430119 | 0.279857 | 0.248881 | 0.597043 | 0.391466 | 0.345670 | 0.161524 | ... | -0.020918 | -0.118421 | -0.214080 | -0.281766 | -0.044233 | 0.048640 | -0.004060 | -0.002464 | 0.025108 | -0.045151 |
| arpu_7 | 0.586362 | 1.000000 | 0.690120 | 0.282407 | 0.419770 | 0.361100 | 0.401375 | 0.574563 | 0.478608 | 0.136504 | ... | -0.006689 | 0.011345 | -0.021199 | -0.058084 | 0.062313 | 0.079665 | 0.041971 | 0.034692 | 0.055187 | 0.018964 |
| arpu_8 | 0.501452 | 0.690120 | 1.000000 | 0.200814 | 0.299652 | 0.453203 | 0.315521 | 0.439337 | 0.624135 | 0.132482 | ... | 0.012530 | 0.196034 | 0.349514 | 0.473184 | 0.265941 | 0.213139 | 0.219765 | 0.272121 | 0.112700 | 0.184229 |
| onnet_mou_6 | 0.430119 | 0.282407 | 0.200814 | 1.000000 | 0.779622 | 0.663355 | 0.116079 | 0.064316 | 0.063291 | 0.034440 | ... | -0.037531 | -0.067139 | -0.126481 | -0.159292 | 0.007603 | 0.052274 | 0.040704 | 0.049691 | 0.040981 | 0.019481 |
| onnet_mou_7 | 0.279857 | 0.419770 | 0.299652 | 0.779622 | 1.000000 | 0.816120 | 0.063987 | 0.111920 | 0.103502 | 0.057619 | ... | -0.046611 | 0.002737 | -0.025058 | -0.036378 | 0.056736 | 0.066212 | 0.047225 | 0.060206 | 0.051115 | 0.027161 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| total_rech_data_diff | 0.048640 | 0.079665 | 0.213139 | 0.052274 | 0.066212 | 0.080085 | 0.085461 | 0.092335 | 0.109554 | 0.038162 | ... | 0.015885 | 0.050462 | 0.391073 | 0.187262 | 0.073136 | 1.000000 | 0.224628 | 0.570444 | 0.242772 | 0.281984 |
| max_rech_data_diff | -0.004060 | 0.041971 | 0.219765 | 0.040704 | 0.047225 | 0.058654 | 0.055573 | 0.064888 | 0.082442 | 0.012810 | ... | 0.022153 | 0.044678 | 0.086772 | 0.258100 | 0.375260 | 0.224628 | 1.000000 | 0.702422 | 0.266951 | 0.456058 |
| av_rech_amt_data_diff | -0.002464 | 0.034692 | 0.272121 | 0.049691 | 0.060206 | 0.069457 | 0.075034 | 0.082796 | 0.100157 | 0.031745 | ... | 0.021371 | 0.042880 | 0.211268 | 0.328750 | 0.274306 | 0.570444 | 0.702422 | 1.000000 | 0.242432 | 0.587969 |
| vol_2g_mb_diff | 0.025108 | 0.055187 | 0.112700 | 0.040981 | 0.051115 | 0.054523 | 0.052383 | 0.059541 | 0.058895 | -0.002067 | ... | 0.014203 | 0.012609 | 0.098095 | 0.091464 | 0.099001 | 0.242772 | 0.266951 | 0.242432 | 1.000000 | -0.073887 |
| vol_3g_mb_diff | -0.045151 | 0.018964 | 0.184229 | 0.019481 | 0.027161 | 0.029444 | 0.031437 | 0.042183 | 0.054460 | 0.028102 | ... | 0.005277 | 0.042216 | 0.138700 | 0.245367 | 0.207395 | 0.281984 | 0.456058 | 0.587969 | -0.073887 | 1.000000 |
163 rows × 163 columns
Let's plot the correlations on a heatmap for better visualisation:
# Let's check the correlation coefficients to see which variables are highly correlated
plt.figure(figsize = (30, 30))
sns.heatmap(cor, annot = True, cmap="YlGnBu")
plt.show();
# import required libraries
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import FeatureUnion
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.svm import SVC
#from imblearn.metrics import sensitivity_specificity_support
telchurn_churn_filtered['churn_probability'] = pd.to_numeric(telchurn_churn_filtered['churn_probability'])
X = telchurn_churn_filtered.drop("churn_probability", axis = 1)
y = telchurn_churn_filtered.churn_probability
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 4, stratify = y)
X_train.shape, y_train.shape
((15759, 168), (15759,))
X_test.shape, y_test.shape
((5254, 168), (5254,))
# aggregate the categorical variables
train = pd.concat([X_train, y_train], axis=1)
train.groupby('night_pck_user_6').churn_probability.mean(), \
train.groupby('night_pck_user_7').churn_probability.mean(), \
train.groupby('night_pck_user_8').churn_probability.mean()
(night_pck_user_6 -1.0 0.091037 0.0 0.061534 1.0 0.110355 Name: churn_probability, dtype: float64, night_pck_user_7 -1.0 0.106008 0.0 0.051774 1.0 0.049518 Name: churn_probability, dtype: float64, night_pck_user_8 -1.0 0.129077 0.0 0.027976 1.0 0.019661 Name: churn_probability, dtype: float64)
train.groupby('fb_user_6').churn_probability.mean(), \
train.groupby('fb_user_7').churn_probability.mean(), \
train.groupby('fb_user_8').churn_probability.mean()
(fb_user_6 -1.0 0.091037 0.0 0.067671 1.0 0.062284 Name: churn_probability, dtype: float64, fb_user_7 -1.0 0.106008 0.0 0.059332 1.0 0.051071 Name: churn_probability, dtype: float64, fb_user_8 -1.0 0.129077 0.0 0.071381 1.0 0.022682 Name: churn_probability, dtype: float64)
mapping = {'night_pck_user_6' : {-1: 0.091037, 0: 0.061534, 1: 0.110355},
'night_pck_user_7' : {-1: 0.106008, 0: 0.051774, 1: 0.049518},
'night_pck_user_8' : {-1: 0.129077, 0: 0.027976, 1: 0.019661},
'fb_user_6' : {-1: 0.091037, 0: 0.067671, 1: 0.062284},
'fb_user_7' : {-1: 0.106008, 0: 0.059332, 1: 0.051071},
'fb_user_8' : {-1: 0.129077, 0: 0.071381, 1: 0.022682}
}
X_train.replace(mapping, inplace = True)
X_test.replace(mapping, inplace = True)
cat_X_train = [col for col in cat_cols if col not in ['telchurn']]
X_train[cat_X_train].info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 15759 entries, 13528 to 20561 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 night_pck_user_6 15759 non-null float64 1 night_pck_user_7 15759 non-null float64 2 night_pck_user_8 15759 non-null float64 3 fb_user_6 15759 non-null float64 4 fb_user_7 15759 non-null float64 5 fb_user_8 15759 non-null float64 dtypes: float64(6) memory usage: 861.8 KB
X_train.shape
(15759, 168)
pca = Pipeline([('scaler', StandardScaler()), ('pca', PCA())])
pca.fit(X_train)
churn_pca = pca.fit_transform(X_train)
# extract pca model from pipeline
pca = pca.named_steps['pca']
# look at explainded variance of PCA components
print(pd.Series(np.round(pca.explained_variance_ratio_.cumsum(), 4)*100))
0 11.58
1 21.06
2 26.97
3 31.94
4 35.97
...
163 100.00
164 100.00
165 100.00
166 100.00
167 100.00
Length: 168, dtype: float64
features = range(pca.n_components_)
cumulative_variance = np.round(np.cumsum(pca.explained_variance_ratio_)*100, decimals=4)
# 100 elements on y-axis; 175 elements on x-axis; 20 is normalising factor
plt.figure(figsize=(175/20,100/20))
plt.plot(cumulative_variance)
plt.title('scree plot');
PCA_VARS = 60
steps = [('scaler', StandardScaler()),
("pca", PCA(n_components=PCA_VARS)),
("logistic", LogisticRegression(class_weight='balanced'))
]
pipeline = Pipeline(steps)
y_train=y_train.round()
pipeline.fit(X_train, y_train)
# check score on train data
pipeline.score(X_train, y_train)
0.8236563233707722
X_train.shape
(15759, 168)
y_test=y_test.round()
y_pred = pipeline.predict(X_test)
# create onfusion matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
# check sensitivity and specificity
# sensitivity, specificity, _ = sensitivity_specificity_support(y_test, y_pred, average='binary')
# print("Sensitivity: \t", round(sensitivity, 2), "\n", "Specificity: \t", round(specificity, 2), sep='')
# check area under curve
y_pred_prob = pipeline.predict_proba(X_test)[:, 1]
print("AUC: \t", round(roc_auc_score(y_test, y_pred_prob),2))
[[3927 892] [ 80 355]] AUC: 0.89
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
precision recall f1-score support
0.0 0.98 0.81 0.89 4819
1.0 0.28 0.82 0.42 435
accuracy 0.81 5254
macro avg 0.63 0.82 0.66 5254
weighted avg 0.92 0.81 0.85 5254
test.csv, kaggle file:¶X_test_sub = pd.read_csv("test.csv")
X_test_sub.shape
(30000, 171)
X_test_sub.head()
| id | circle_id | loc_og_t2o_mou | std_og_t2o_mou | loc_ic_t2o_mou | last_date_of_month_6 | last_date_of_month_7 | last_date_of_month_8 | arpu_6 | arpu_7 | ... | sachet_3g_6 | sachet_3g_7 | sachet_3g_8 | fb_user_6 | fb_user_7 | fb_user_8 | aon | aug_vbc_3g | jul_vbc_3g | jun_vbc_3g | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 69999 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 91.882 | 65.330 | ... | 0 | 0 | 0 | NaN | NaN | NaN | 1692 | 0.00 | 0.00 | 0.00 |
| 1 | 70000 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 414.168 | 515.568 | ... | 0 | 0 | 0 | NaN | NaN | NaN | 2533 | 0.00 | 0.00 | 0.00 |
| 2 | 70001 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 329.844 | 434.884 | ... | 0 | 0 | 0 | NaN | NaN | NaN | 277 | 525.61 | 758.41 | 241.84 |
| 3 | 70002 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 43.550 | 171.390 | ... | 0 | 0 | 0 | NaN | NaN | NaN | 1244 | 0.00 | 0.00 | 0.00 |
| 4 | 70003 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 306.854 | 406.289 | ... | 0 | 0 | 0 | NaN | NaN | NaN | 462 | 0.00 | 0.00 | 0.00 |
5 rows × 171 columns
final_col=list(X_train.columns)
len(final_col)
168
print(final_col)
['arpu_6', 'arpu_7', 'arpu_8', 'onnet_mou_6', 'onnet_mou_7', 'onnet_mou_8', 'offnet_mou_6', 'offnet_mou_7', 'offnet_mou_8', 'roam_ic_mou_6', 'roam_ic_mou_7', 'roam_ic_mou_8', 'roam_og_mou_6', 'roam_og_mou_7', 'roam_og_mou_8', 'loc_og_t2t_mou_6', 'loc_og_t2t_mou_7', 'loc_og_t2t_mou_8', 'loc_og_t2m_mou_6', 'loc_og_t2m_mou_7', 'loc_og_t2m_mou_8', 'loc_og_t2f_mou_6', 'loc_og_t2f_mou_7', 'loc_og_t2f_mou_8', 'loc_og_t2c_mou_6', 'loc_og_t2c_mou_7', 'loc_og_t2c_mou_8', 'loc_og_mou_6', 'loc_og_mou_7', 'loc_og_mou_8', 'std_og_t2t_mou_6', 'std_og_t2t_mou_7', 'std_og_t2t_mou_8', 'std_og_t2m_mou_6', 'std_og_t2m_mou_7', 'std_og_t2m_mou_8', 'std_og_t2f_mou_6', 'std_og_t2f_mou_7', 'std_og_t2f_mou_8', 'std_og_t2c_mou_6', 'std_og_t2c_mou_7', 'std_og_t2c_mou_8', 'std_og_mou_6', 'std_og_mou_7', 'std_og_mou_8', 'isd_og_mou_6', 'isd_og_mou_7', 'isd_og_mou_8', 'spl_og_mou_6', 'spl_og_mou_7', 'spl_og_mou_8', 'og_others_6', 'og_others_7', 'og_others_8', 'total_og_mou_6', 'total_og_mou_7', 'total_og_mou_8', 'loc_ic_t2t_mou_6', 'loc_ic_t2t_mou_7', 'loc_ic_t2t_mou_8', 'loc_ic_t2m_mou_6', 'loc_ic_t2m_mou_7', 'loc_ic_t2m_mou_8', 'loc_ic_t2f_mou_6', 'loc_ic_t2f_mou_7', 'loc_ic_t2f_mou_8', 'loc_ic_mou_6', 'loc_ic_mou_7', 'loc_ic_mou_8', 'std_ic_t2t_mou_6', 'std_ic_t2t_mou_7', 'std_ic_t2t_mou_8', 'std_ic_t2m_mou_6', 'std_ic_t2m_mou_7', 'std_ic_t2m_mou_8', 'std_ic_t2f_mou_6', 'std_ic_t2f_mou_7', 'std_ic_t2f_mou_8', 'std_ic_t2o_mou_6', 'std_ic_t2o_mou_7', 'std_ic_t2o_mou_8', 'std_ic_mou_6', 'std_ic_mou_7', 'std_ic_mou_8', 'total_ic_mou_6', 'total_ic_mou_7', 'total_ic_mou_8', 'spl_ic_mou_6', 'spl_ic_mou_7', 'spl_ic_mou_8', 'isd_ic_mou_6', 'isd_ic_mou_7', 'isd_ic_mou_8', 'ic_others_6', 'ic_others_7', 'ic_others_8', 'total_rech_num_6', 'total_rech_num_7', 'total_rech_num_8', 'total_rech_amt_6', 'total_rech_amt_7', 'total_rech_amt_8', 'max_rech_amt_6', 'max_rech_amt_7', 'max_rech_amt_8', 'last_day_rch_amt_6', 'last_day_rch_amt_7', 'last_day_rch_amt_8', 'total_rech_data_6', 'total_rech_data_7', 'total_rech_data_8', 'max_rech_data_6', 'max_rech_data_7', 'max_rech_data_8', 'av_rech_amt_data_6', 'av_rech_amt_data_7', 'av_rech_amt_data_8', 'vol_2g_mb_6', 'vol_2g_mb_7', 'vol_2g_mb_8', 'vol_3g_mb_6', 'vol_3g_mb_7', 'vol_3g_mb_8', 'night_pck_user_6', 'night_pck_user_7', 'night_pck_user_8', 'monthly_2g_6', 'monthly_2g_7', 'monthly_2g_8', 'sachet_2g_6', 'sachet_2g_7', 'sachet_2g_8', 'monthly_3g_6', 'monthly_3g_7', 'monthly_3g_8', 'sachet_3g_6', 'sachet_3g_7', 'sachet_3g_8', 'fb_user_6', 'fb_user_7', 'fb_user_8', 'aon', 'aug_vbc_3g', 'jul_vbc_3g', 'jun_vbc_3g', 'arpu_diff', 'onnet_mou_diff', 'offnet_mou_diff', 'roam_ic_mou_diff', 'roam_og_mou_diff', 'loc_og_mou_diff', 'std_og_mou_diff', 'isd_og_mou_diff', 'spl_og_mou_diff', 'total_og_mou_diff', 'loc_ic_mou_diff', 'std_ic_mou_diff', 'isd_ic_mou_diff', 'spl_ic_mou_diff', 'total_ic_mou_diff', 'total_rech_num_diff', 'total_rech_amt_diff', 'max_rech_amt_diff', 'total_rech_data_diff', 'max_rech_data_diff', 'av_rech_amt_data_diff', 'vol_2g_mb_diff', 'vol_3g_mb_diff']
adn_col =[]
for col in final_col:
if col not in list(X_test_sub.columns):
adn_col.append(col)
len(adn_col)
23
del_col =[]
for col in X_test_sub.columns:
if col not in final_col:
del_col.append(col)
X_test_sub = X_test_sub.drop(del_col, axis=1)
len(X_test_sub.columns)
145
X_test_sub['arpu_diff'] = X_test_sub.arpu_8 - ((X_test_sub.arpu_6 + X_test_sub.arpu_7)/2)
X_test_sub['onnet_mou_diff'] = X_test_sub.onnet_mou_8 - ((X_test_sub.onnet_mou_6 + X_test_sub.onnet_mou_7)/2)
X_test_sub['offnet_mou_diff'] = X_test_sub.offnet_mou_8 - ((X_test_sub.offnet_mou_6 + X_test_sub.offnet_mou_7)/2)
X_test_sub['roam_ic_mou_diff'] = X_test_sub.roam_ic_mou_8 - ((X_test_sub.roam_ic_mou_6 + X_test_sub.roam_ic_mou_7)/2)
X_test_sub['roam_og_mou_diff'] = X_test_sub.roam_og_mou_8 - ((X_test_sub.roam_og_mou_6 + X_test_sub.roam_og_mou_7)/2)
X_test_sub['loc_og_mou_diff'] = X_test_sub.loc_og_mou_8 - ((X_test_sub.loc_og_mou_6 + X_test_sub.loc_og_mou_7)/2)
X_test_sub['std_og_mou_diff'] = X_test_sub.std_og_mou_8 - ((X_test_sub.std_og_mou_6 + X_test_sub.std_og_mou_7)/2)
X_test_sub['isd_og_mou_diff'] = X_test_sub.isd_og_mou_8 - ((X_test_sub.isd_og_mou_6 + X_test_sub.isd_og_mou_7)/2)
X_test_sub['spl_og_mou_diff'] = X_test_sub.spl_og_mou_8 - ((X_test_sub.spl_og_mou_6 + X_test_sub.spl_og_mou_7)/2)
X_test_sub['total_og_mou_diff'] = X_test_sub.total_og_mou_8 - ((X_test_sub.total_og_mou_6 + X_test_sub.total_og_mou_7)/2)
X_test_sub['loc_ic_mou_diff'] = X_test_sub.loc_ic_mou_8 - ((X_test_sub.loc_ic_mou_6 + X_test_sub.loc_ic_mou_7)/2)
X_test_sub['std_ic_mou_diff'] = X_test_sub.std_ic_mou_8 - ((X_test_sub.std_ic_mou_6 + X_test_sub.std_ic_mou_7)/2)
X_test_sub['isd_ic_mou_diff'] = X_test_sub.isd_ic_mou_8 - ((X_test_sub.isd_ic_mou_6 + X_test_sub.isd_ic_mou_7)/2)
X_test_sub['spl_ic_mou_diff'] = X_test_sub.spl_ic_mou_8 - ((X_test_sub.spl_ic_mou_6 + X_test_sub.spl_ic_mou_7)/2)
X_test_sub['total_ic_mou_diff'] = X_test_sub.total_ic_mou_8 - ((X_test_sub.total_ic_mou_6 + X_test_sub.total_ic_mou_7)/2)
X_test_sub['total_rech_num_diff'] = X_test_sub.total_rech_num_8 - ((X_test_sub.total_rech_num_6 + X_test_sub.total_rech_num_7)/2)
X_test_sub['total_rech_amt_diff'] = X_test_sub.total_rech_amt_8 - ((X_test_sub.total_rech_amt_6 + X_test_sub.total_rech_amt_7)/2)
X_test_sub['max_rech_amt_diff'] = X_test_sub.max_rech_amt_8 - ((X_test_sub.max_rech_amt_6 + X_test_sub.max_rech_amt_7)/2)
X_test_sub['total_rech_data_diff'] = X_test_sub.total_rech_data_8 - ((X_test_sub.total_rech_data_6 + X_test_sub.total_rech_data_7)/2)
X_test_sub['max_rech_data_diff'] = X_test_sub.max_rech_data_8 - ((X_test_sub.max_rech_data_6 + X_test_sub.max_rech_data_7)/2)
X_test_sub['av_rech_amt_data_diff'] = X_test_sub.av_rech_amt_data_8 - ((X_test_sub.av_rech_amt_data_6 + X_test_sub.av_rech_amt_data_7)/2)
X_test_sub['vol_2g_mb_diff'] = X_test_sub.vol_2g_mb_8 - ((X_test_sub.vol_2g_mb_6 + X_test_sub.vol_2g_mb_7)/2)
X_test_sub['vol_3g_mb_diff'] = X_test_sub.vol_3g_mb_8 - ((X_test_sub.vol_3g_mb_6 + X_test_sub.vol_3g_mb_7)/2)
X_test_sub.shape, X_train.shape
((30000, 168), (15759, 168))
X_train
| arpu_6 | arpu_7 | arpu_8 | onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | ... | spl_ic_mou_diff | total_ic_mou_diff | total_rech_num_diff | total_rech_amt_diff | max_rech_amt_diff | total_rech_data_diff | max_rech_data_diff | av_rech_amt_data_diff | vol_2g_mb_diff | vol_3g_mb_diff | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13528 | 559.135 | 508.842 | 553.107 | 597.83 | 448.53 | 594.29 | 149.26 | 175.73 | 299.99 | 4.180000 | ... | 0.00 | -61.735 | -1.0 | 45.0 | 31.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0.000 |
| 7183 | 381.035 | 660.847 | 199.111 | 364.79 | 617.33 | 51.71 | 124.69 | 258.86 | 40.08 | 0.000000 | ... | 0.00 | -139.375 | -18.0 | -438.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0.000 |
| 14666 | 276.101 | 518.185 | 267.706 | 84.68 | 117.61 | 87.43 | 552.86 | 510.43 | 383.46 | 0.000000 | ... | 0.00 | -312.175 | -2.0 | -196.0 | 17.5 | 0.0 | 0.0 | 0.0 | 21.065 | -156.810 |
| 17922 | 413.743 | 476.101 | 409.262 | 636.51 | 728.43 | 700.23 | 177.04 | 137.53 | 162.48 | 0.000000 | ... | 0.00 | -30.925 | 0.5 | 76.0 | 20.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0.000 |
| 5165 | 497.504 | 444.456 | 293.627 | 739.41 | 185.51 | 66.08 | 481.24 | 673.68 | 534.98 | 0.000000 | ... | 0.00 | -28.100 | -6.5 | -201.0 | 20.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0.000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 10248 | 593.710 | 176.365 | 468.511 | 319.28 | 6.16 | 41.21 | 191.86 | 227.69 | 467.46 | 248.466563 | ... | 0.00 | 42.745 | -2.5 | 136.0 | 35.0 | -0.5 | 0.0 | -72.5 | -14.460 | -571.775 |
| 6594 | 795.701 | 1075.717 | 820.655 | 1208.24 | 1380.31 | 1310.78 | 245.46 | 782.89 | 478.08 | 0.000000 | ... | 0.00 | -122.420 | -20.0 | 7.5 | 61.5 | 0.0 | 0.0 | 0.0 | 0.000 | 0.000 |
| 492 | 407.097 | 488.824 | 260.994 | 107.61 | 142.88 | 77.16 | 446.93 | 483.16 | 195.24 | 0.000000 | ... | -0.13 | -240.415 | 0.0 | -186.0 | -40.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0.000 |
| 19099 | 490.151 | 441.268 | 393.735 | 337.76 | 218.16 | 396.83 | 427.83 | 403.36 | 229.94 | 0.000000 | ... | 0.00 | 34.625 | -4.5 | -75.5 | 0.0 | -1.0 | -8.5 | -15.5 | -54.400 | 0.000 |
| 20561 | 443.074 | 553.529 | 578.308 | 455.09 | 536.81 | 863.24 | 285.44 | 645.48 | 491.54 | 0.000000 | ... | 0.25 | -18.230 | 1.0 | 95.5 | 20.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0.000 |
15759 rows × 168 columns
X_test_sub
| arpu_6 | arpu_7 | arpu_8 | onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | ... | spl_ic_mou_diff | total_ic_mou_diff | total_rech_num_diff | total_rech_amt_diff | max_rech_amt_diff | total_rech_data_diff | max_rech_data_diff | av_rech_amt_data_diff | vol_2g_mb_diff | vol_3g_mb_diff | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 91.882 | 65.330 | 64.445 | 31.78 | 20.23 | 23.11 | 60.16 | 32.16 | 34.83 | 0.00 | ... | -0.05 | -6.385 | -1.0 | -36.5 | -10.0 | NaN | NaN | NaN | 0.000 | 0.000 |
| 1 | 414.168 | 515.568 | 360.868 | 75.51 | 41.21 | 19.84 | 474.34 | 621.84 | 394.94 | 0.00 | ... | 0.00 | -36.325 | 0.5 | 0.0 | 0.0 | NaN | NaN | NaN | 0.000 | 0.000 |
| 2 | 329.844 | 434.884 | 746.239 | 7.54 | 7.86 | 8.40 | 16.98 | 45.81 | 45.04 | 22.81 | ... | 0.00 | 0.000 | -2.5 | 250.0 | 100.0 | NaN | NaN | NaN | 0.000 | 0.000 |
| 3 | 43.550 | 171.390 | 24.400 | 5.31 | 2.16 | 0.00 | 40.04 | 205.01 | 24.01 | 0.00 | ... | 0.00 | 10.065 | -2.0 | -185.0 | -130.0 | NaN | NaN | NaN | 0.000 | 0.000 |
| 4 | 306.854 | 406.289 | 413.329 | 450.93 | 609.03 | 700.68 | 60.94 | 23.84 | 74.16 | 0.00 | ... | 0.00 | 74.430 | -1.0 | 123.0 | 20.0 | NaN | NaN | NaN | 0.000 | 0.000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 29995 | 718.870 | 396.259 | 406.150 | 324.46 | 250.01 | 285.41 | 190.36 | 166.14 | 221.96 | 87.76 | ... | 0.00 | 199.105 | -3.0 | -174.0 | -4.5 | NaN | NaN | NaN | -10.990 | -22.535 |
| 29996 | 218.327 | 324.070 | 374.981 | 263.79 | 467.24 | 650.26 | 108.53 | 124.11 | 189.76 | 3.40 | ... | 0.00 | 38.470 | 1.0 | 188.5 | 69.5 | NaN | NaN | NaN | 0.000 | 0.000 |
| 29997 | 139.473 | 38.230 | 180.194 | 11.08 | 14.89 | 19.94 | 37.24 | 16.74 | 45.28 | 0.00 | ... | 0.00 | 6.160 | 5.5 | 165.0 | 25.0 | NaN | NaN | NaN | 0.000 | 0.000 |
| 29998 | 1122.912 | 781.121 | 257.439 | 122.74 | 50.23 | 26.46 | 1854.09 | 1088.83 | 306.16 | 0.00 | ... | 0.00 | -27.575 | -25.0 | -774.0 | -38.0 | NaN | NaN | NaN | 0.000 | 0.000 |
| 29999 | 318.980 | 307.890 | 605.320 | 28.09 | 32.93 | 66.31 | 107.78 | 241.53 | 331.64 | 134.54 | ... | 0.33 | 88.320 | -0.5 | -440.0 | -227.5 | -1.0 | 105.5 | -96.5 | -68.035 | 499.685 |
30000 rows × 168 columns
X_test_sub.isna().any()
arpu_6 False
arpu_7 False
arpu_8 False
onnet_mou_6 True
onnet_mou_7 True
...
total_rech_data_diff True
max_rech_data_diff True
av_rech_amt_data_diff True
vol_2g_mb_diff False
vol_3g_mb_diff False
Length: 168, dtype: bool
nan_test_list = X_test_sub.columns[X_test_sub.isna().any()].tolist()
for i in nan_test_list:
X_test_sub[i] = X_test_sub[i].fillna(0)
X_test_sub.isna().any()
arpu_6 False
arpu_7 False
arpu_8 False
onnet_mou_6 False
onnet_mou_7 False
...
total_rech_data_diff False
max_rech_data_diff False
av_rech_amt_data_diff False
vol_2g_mb_diff False
vol_3g_mb_diff False
Length: 168, dtype: bool
mapping = {'night_pck_user_6' : {-1: 0.091037, 0: 0.061534, 1: 0.110355},
'night_pck_user_7' : {-1: 0.106008, 0: 0.051774, 1: 0.049518},
'night_pck_user_8' : {-1: 0.129077, 0: 0.027976, 1: 0.019661},
'fb_user_6' : {-1: 0.091037, 0: 0.067671, 1: 0.062284},
'fb_user_7' : {-1: 0.106008, 0: 0.059332, 1: 0.051071},
'fb_user_8' : {-1: 0.129077, 0: 0.071381, 1: 0.022682}
}
#X_train.replace(mapping, inplace = True)
X_test_sub.replace(mapping, inplace = True)
y_pred = pipeline.predict(X_test_sub)
X_test_sub.shape
(30000, 168)
len(y_pred)
30000
id = range(69999, 69999+30000)
len(id)
30000
y_train.value_counts()/y_train.shape
0.0 0.917254 1.0 0.082746 Name: churn_probability, dtype: float64
# PCA
pca = PCA()
# logistic regression - the class weight is used to handle class imbalance - it adjusts the cost function
logistic = LogisticRegression(class_weight={0:0.1, 1: 0.9})
# create pipeline
steps = [("scaler", StandardScaler()),
("pca", pca),
("logistic", logistic)
]
# compile pipeline
pca_logistic = Pipeline(steps)
# hyperparameter space
params = {'pca__n_components': [60, 80], 'logistic__C': [0.1, 0.5, 1, 2, 3, 4, 5, 10], 'logistic__penalty': ['l1', 'l2']}
# create 5 folds
folds = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 4)
# create gridsearch object
model = GridSearchCV(estimator=pca_logistic, cv=folds, param_grid=params, scoring='roc_auc', n_jobs=-1, verbose=1)
# fit model
model.fit(X_train, y_train)
Fitting 5 folds for each of 32 candidates, totalling 160 fits
GridSearchCV(cv=StratifiedKFold(n_splits=5, random_state=4, shuffle=True),
estimator=Pipeline(steps=[('scaler', StandardScaler()),
('pca', PCA()),
('logistic',
LogisticRegression(class_weight={0: 0.1,
1: 0.9}))]),
n_jobs=-1,
param_grid={'logistic__C': [0.1, 0.5, 1, 2, 3, 4, 5, 10],
'logistic__penalty': ['l1', 'l2'],
'pca__n_components': [60, 80]},
scoring='roc_auc', verbose=1)
telchurn_result = pd.DataFrame(model.cv_results_)
print("Best AUC: ", model.best_score_)
print("Best hyperparameters: ", model.best_params_)
Best AUC: 0.8985566780774261
Best hyperparameters: {'logistic__C': 0.1, 'logistic__penalty': 'l2', 'pca__n_components': 80}
# predict churn on test data
y_pred = model.predict(X_test)
# create onfusion matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
# check sensitivity and specificity
# sensitivity, specificity, _ = sensitivity_specificity_support(y_test, y_pred, average='binary')
# print("Sensitivity: \t", round(sensitivity, 2), "\n", "Specificity: \t", round(specificity, 2), sep='')
# check area under curve
y_pred_prob = model.predict_proba(X_test)[:, 1]
print("AUC: \t", round(roc_auc_score(y_test, y_pred_prob),2))
[[4081 738] [ 85 350]] AUC: 0.89
print(classification_report(y_test, y_pred))
precision recall f1-score support
0.0 0.98 0.85 0.91 4819
1.0 0.32 0.80 0.46 435
accuracy 0.84 5254
macro avg 0.65 0.83 0.68 5254
weighted avg 0.93 0.84 0.87 5254
X_train.columns
Index(['arpu_6', 'arpu_7', 'arpu_8', 'onnet_mou_6', 'onnet_mou_7',
'onnet_mou_8', 'offnet_mou_6', 'offnet_mou_7', 'offnet_mou_8',
'roam_ic_mou_6',
...
'spl_ic_mou_diff', 'total_ic_mou_diff', 'total_rech_num_diff',
'total_rech_amt_diff', 'max_rech_amt_diff', 'total_rech_data_diff',
'max_rech_data_diff', 'av_rech_amt_data_diff', 'vol_2g_mb_diff',
'vol_3g_mb_diff'],
dtype='object', length=168)
# random forest - the class weight is used to handle class imbalance - it adjusts the cost function
forest = RandomForestClassifier(class_weight={0:0.1, 1: 0.9}, n_jobs = -1)
# hyperparameter space
params = {"criterion": ['gini', 'entropy'], "max_features": ['auto', 0.4]}
# create 5 folds
folds = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 4)
# create gridsearch object
model = GridSearchCV(estimator=forest, cv=folds, param_grid=params, scoring='roc_auc', n_jobs=-1, verbose=1)
# fit model
model.fit(X_train, y_train)
Fitting 5 folds for each of 4 candidates, totalling 20 fits
GridSearchCV(cv=StratifiedKFold(n_splits=5, random_state=4, shuffle=True),
estimator=RandomForestClassifier(class_weight={0: 0.1, 1: 0.9},
n_jobs=-1),
n_jobs=-1,
param_grid={'criterion': ['gini', 'entropy'],
'max_features': ['auto', 0.4]},
scoring='roc_auc', verbose=1)
# print best hyperparameters
print("Best AUC: ", model.best_score_)
print("Best hyperparameters: ", model.best_params_)
Best AUC: 0.926888306882045
Best hyperparameters: {'criterion': 'entropy', 'max_features': 'auto'}
y_pred = model.predict(X_test)
# create onfusion matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
# check sensitivity and specificity
# check area under curve
y_pred_prob = model.predict_proba(X_test)[:, 1]
print("AUC: \t", round(roc_auc_score(y_test, y_pred_prob),2))
[[4738 81] [ 228 207]] AUC: 0.93
print(classification_report(y_test, y_pred))
precision recall f1-score support
0.0 0.95 0.98 0.97 4819
1.0 0.72 0.48 0.57 435
accuracy 0.94 5254
macro avg 0.84 0.73 0.77 5254
weighted avg 0.93 0.94 0.94 5254
test.csv, kaggle file for Random forest:¶X_test_sub = pd.read_csv("test.csv")
X_test_sub.shape
final_col=list(X_train.columns)
print(final_col)
['arpu_6', 'arpu_7', 'arpu_8', 'onnet_mou_6', 'onnet_mou_7', 'onnet_mou_8', 'offnet_mou_6', 'offnet_mou_7', 'offnet_mou_8', 'roam_ic_mou_6', 'roam_ic_mou_7', 'roam_ic_mou_8', 'roam_og_mou_6', 'roam_og_mou_7', 'roam_og_mou_8', 'loc_og_t2t_mou_6', 'loc_og_t2t_mou_7', 'loc_og_t2t_mou_8', 'loc_og_t2m_mou_6', 'loc_og_t2m_mou_7', 'loc_og_t2m_mou_8', 'loc_og_t2f_mou_6', 'loc_og_t2f_mou_7', 'loc_og_t2f_mou_8', 'loc_og_t2c_mou_6', 'loc_og_t2c_mou_7', 'loc_og_t2c_mou_8', 'loc_og_mou_6', 'loc_og_mou_7', 'loc_og_mou_8', 'std_og_t2t_mou_6', 'std_og_t2t_mou_7', 'std_og_t2t_mou_8', 'std_og_t2m_mou_6', 'std_og_t2m_mou_7', 'std_og_t2m_mou_8', 'std_og_t2f_mou_6', 'std_og_t2f_mou_7', 'std_og_t2f_mou_8', 'std_og_t2c_mou_6', 'std_og_t2c_mou_7', 'std_og_t2c_mou_8', 'std_og_mou_6', 'std_og_mou_7', 'std_og_mou_8', 'isd_og_mou_6', 'isd_og_mou_7', 'isd_og_mou_8', 'spl_og_mou_6', 'spl_og_mou_7', 'spl_og_mou_8', 'og_others_6', 'og_others_7', 'og_others_8', 'total_og_mou_6', 'total_og_mou_7', 'total_og_mou_8', 'loc_ic_t2t_mou_6', 'loc_ic_t2t_mou_7', 'loc_ic_t2t_mou_8', 'loc_ic_t2m_mou_6', 'loc_ic_t2m_mou_7', 'loc_ic_t2m_mou_8', 'loc_ic_t2f_mou_6', 'loc_ic_t2f_mou_7', 'loc_ic_t2f_mou_8', 'loc_ic_mou_6', 'loc_ic_mou_7', 'loc_ic_mou_8', 'std_ic_t2t_mou_6', 'std_ic_t2t_mou_7', 'std_ic_t2t_mou_8', 'std_ic_t2m_mou_6', 'std_ic_t2m_mou_7', 'std_ic_t2m_mou_8', 'std_ic_t2f_mou_6', 'std_ic_t2f_mou_7', 'std_ic_t2f_mou_8', 'std_ic_t2o_mou_6', 'std_ic_t2o_mou_7', 'std_ic_t2o_mou_8', 'std_ic_mou_6', 'std_ic_mou_7', 'std_ic_mou_8', 'total_ic_mou_6', 'total_ic_mou_7', 'total_ic_mou_8', 'spl_ic_mou_6', 'spl_ic_mou_7', 'spl_ic_mou_8', 'isd_ic_mou_6', 'isd_ic_mou_7', 'isd_ic_mou_8', 'ic_others_6', 'ic_others_7', 'ic_others_8', 'total_rech_num_6', 'total_rech_num_7', 'total_rech_num_8', 'total_rech_amt_6', 'total_rech_amt_7', 'total_rech_amt_8', 'max_rech_amt_6', 'max_rech_amt_7', 'max_rech_amt_8', 'last_day_rch_amt_6', 'last_day_rch_amt_7', 'last_day_rch_amt_8', 'total_rech_data_6', 'total_rech_data_7', 'total_rech_data_8', 'max_rech_data_6', 'max_rech_data_7', 'max_rech_data_8', 'av_rech_amt_data_6', 'av_rech_amt_data_7', 'av_rech_amt_data_8', 'vol_2g_mb_6', 'vol_2g_mb_7', 'vol_2g_mb_8', 'vol_3g_mb_6', 'vol_3g_mb_7', 'vol_3g_mb_8', 'night_pck_user_6', 'night_pck_user_7', 'night_pck_user_8', 'monthly_2g_6', 'monthly_2g_7', 'monthly_2g_8', 'sachet_2g_6', 'sachet_2g_7', 'sachet_2g_8', 'monthly_3g_6', 'monthly_3g_7', 'monthly_3g_8', 'sachet_3g_6', 'sachet_3g_7', 'sachet_3g_8', 'fb_user_6', 'fb_user_7', 'fb_user_8', 'aon', 'aug_vbc_3g', 'jul_vbc_3g', 'jun_vbc_3g', 'arpu_diff', 'onnet_mou_diff', 'offnet_mou_diff', 'roam_ic_mou_diff', 'roam_og_mou_diff', 'loc_og_mou_diff', 'std_og_mou_diff', 'isd_og_mou_diff', 'spl_og_mou_diff', 'total_og_mou_diff', 'loc_ic_mou_diff', 'std_ic_mou_diff', 'isd_ic_mou_diff', 'spl_ic_mou_diff', 'total_ic_mou_diff', 'total_rech_num_diff', 'total_rech_amt_diff', 'max_rech_amt_diff', 'total_rech_data_diff', 'max_rech_data_diff', 'av_rech_amt_data_diff', 'vol_2g_mb_diff', 'vol_3g_mb_diff']
len(final_col)
168
adn_col =[]
for col in final_col:
if col not in list(X_test_sub.columns):
adn_col.append(col)
len(adn_col)
23
del_col =[]
for col in X_test_sub.columns:
if col not in final_col:
del_col.append(col)
X_test_sub = X_test_sub.drop(del_col, axis=1)
len(X_test_sub.columns)
145
X_test_sub['arpu_diff'] = X_test_sub.arpu_8 - ((X_test_sub.arpu_6 + X_test_sub.arpu_7)/2)
X_test_sub['onnet_mou_diff'] = X_test_sub.onnet_mou_8 - ((X_test_sub.onnet_mou_6 + X_test_sub.onnet_mou_7)/2)
X_test_sub['offnet_mou_diff'] = X_test_sub.offnet_mou_8 - ((X_test_sub.offnet_mou_6 + X_test_sub.offnet_mou_7)/2)
X_test_sub['roam_ic_mou_diff'] = X_test_sub.roam_ic_mou_8 - ((X_test_sub.roam_ic_mou_6 + X_test_sub.roam_ic_mou_7)/2)
X_test_sub['roam_og_mou_diff'] = X_test_sub.roam_og_mou_8 - ((X_test_sub.roam_og_mou_6 + X_test_sub.roam_og_mou_7)/2)
X_test_sub['loc_og_mou_diff'] = X_test_sub.loc_og_mou_8 - ((X_test_sub.loc_og_mou_6 + X_test_sub.loc_og_mou_7)/2)
X_test_sub['std_og_mou_diff'] = X_test_sub.std_og_mou_8 - ((X_test_sub.std_og_mou_6 + X_test_sub.std_og_mou_7)/2)
X_test_sub['isd_og_mou_diff'] = X_test_sub.isd_og_mou_8 - ((X_test_sub.isd_og_mou_6 + X_test_sub.isd_og_mou_7)/2)
X_test_sub['spl_og_mou_diff'] = X_test_sub.spl_og_mou_8 - ((X_test_sub.spl_og_mou_6 + X_test_sub.spl_og_mou_7)/2)
X_test_sub['total_og_mou_diff'] = X_test_sub.total_og_mou_8 - ((X_test_sub.total_og_mou_6 + X_test_sub.total_og_mou_7)/2)
X_test_sub['loc_ic_mou_diff'] = X_test_sub.loc_ic_mou_8 - ((X_test_sub.loc_ic_mou_6 + X_test_sub.loc_ic_mou_7)/2)
X_test_sub['std_ic_mou_diff'] = X_test_sub.std_ic_mou_8 - ((X_test_sub.std_ic_mou_6 + X_test_sub.std_ic_mou_7)/2)
X_test_sub['isd_ic_mou_diff'] = X_test_sub.isd_ic_mou_8 - ((X_test_sub.isd_ic_mou_6 + X_test_sub.isd_ic_mou_7)/2)
X_test_sub['spl_ic_mou_diff'] = X_test_sub.spl_ic_mou_8 - ((X_test_sub.spl_ic_mou_6 + X_test_sub.spl_ic_mou_7)/2)
X_test_sub['total_ic_mou_diff'] = X_test_sub.total_ic_mou_8 - ((X_test_sub.total_ic_mou_6 + X_test_sub.total_ic_mou_7)/2)
X_test_sub['total_rech_num_diff'] = X_test_sub.total_rech_num_8 - ((X_test_sub.total_rech_num_6 + X_test_sub.total_rech_num_7)/2)
X_test_sub['total_rech_amt_diff'] = X_test_sub.total_rech_amt_8 - ((X_test_sub.total_rech_amt_6 + X_test_sub.total_rech_amt_7)/2)
X_test_sub['max_rech_amt_diff'] = X_test_sub.max_rech_amt_8 - ((X_test_sub.max_rech_amt_6 + X_test_sub.max_rech_amt_7)/2)
X_test_sub['total_rech_data_diff'] = X_test_sub.total_rech_data_8 - ((X_test_sub.total_rech_data_6 + X_test_sub.total_rech_data_7)/2)
X_test_sub['max_rech_data_diff'] = X_test_sub.max_rech_data_8 - ((X_test_sub.max_rech_data_6 + X_test_sub.max_rech_data_7)/2)
X_test_sub['av_rech_amt_data_diff'] = X_test_sub.av_rech_amt_data_8 - ((X_test_sub.av_rech_amt_data_6 + X_test_sub.av_rech_amt_data_7)/2)
X_test_sub['vol_2g_mb_diff'] = X_test_sub.vol_2g_mb_8 - ((X_test_sub.vol_2g_mb_6 + X_test_sub.vol_2g_mb_7)/2)
X_test_sub['vol_3g_mb_diff'] = X_test_sub.vol_3g_mb_8 - ((X_test_sub.vol_3g_mb_6 + X_test_sub.vol_3g_mb_7)/2)
X_test_sub.shape
(30000, 168)
X_test_sub.isna().any()
arpu_6 False
arpu_7 False
arpu_8 False
onnet_mou_6 True
onnet_mou_7 True
...
total_rech_data_diff True
max_rech_data_diff True
av_rech_amt_data_diff True
vol_2g_mb_diff False
vol_3g_mb_diff False
Length: 168, dtype: bool
nan_test_list = X_test_sub.columns[X_test_sub.isna().any()].tolist()
for i in nan_test_list:
X_test_sub[i] = X_test_sub[i].fillna(0)
X_test_sub.isna().any()
arpu_6 False
arpu_7 False
arpu_8 False
onnet_mou_6 False
onnet_mou_7 False
...
total_rech_data_diff False
max_rech_data_diff False
av_rech_amt_data_diff False
vol_2g_mb_diff False
vol_3g_mb_diff False
Length: 168, dtype: bool
mapping = {'night_pck_user_6' : {-1: 0.091037, 0: 0.061534, 1: 0.110355},
'night_pck_user_7' : {-1: 0.106008, 0: 0.051774, 1: 0.049518},
'night_pck_user_8' : {-1: 0.129077, 0: 0.027976, 1: 0.019661},
'fb_user_6' : {-1: 0.091037, 0: 0.067671, 1: 0.062284},
'fb_user_7' : {-1: 0.106008, 0: 0.059332, 1: 0.051071},
'fb_user_8' : {-1: 0.129077, 0: 0.071381, 1: 0.022682}
}
#X_train.replace(mapping, inplace = True)
X_test_sub.replace(mapping, inplace = True)
y_pred = model.predict(X_test_sub)
telchurn_res = pd.DataFrame(
{'id': id,
'churn_probability': y_pred
})
telchurn_res['churn_probability'] = telchurn_res['churn_probability'].astype(int)
Exporting the csv file for Kaggle submissioon:
telchurn_res.to_csv("Submission.csv", index = False)
# run a random forest model on train data
max_features = int(round(np.sqrt(X_train.shape[1]))) # number of variables to consider to split each node
print(max_features)
13
rf_model = RandomForestClassifier(n_estimators=100,
max_features=max_features,
class_weight={0:0.1, 1: 0.9},
oob_score=True,
random_state=4,
verbose=1)
# fit model
rf_model.fit(X_train, y_train)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. [Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed: 11.8s finished
RandomForestClassifier(class_weight={0: 0.1, 1: 0.9}, max_features=13,
oob_score=True, random_state=4, verbose=1)
rf_model.oob_score_
0.941747572815534
# predict churn on test data
y_pred = rf_model.predict(X_test)
# create onfusion matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
# check sensitivity and specificity
# sensitivity, specificity, _ = sensitivity_specificity_support(y_test, y_pred, average='binary')
# print("Sensitivity: \t", round(sensitivity, 2), "\n", "Specificity: \t", round(specificity, 2), sep='')
# check area under curve
y_pred_prob = rf_model.predict_proba(X_test)[:, 1]
print("ROC: \t", round(roc_auc_score(y_test, y_pred_prob),2))
print(classification_report(y_test, y_pred))
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. [Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed: 0.0s finished [Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[[4749 70]
[ 240 195]]
ROC: 0.93
precision recall f1-score support
0.0 0.95 0.99 0.97 4819
1.0 0.74 0.45 0.56 435
accuracy 0.94 5254
macro avg 0.84 0.72 0.76 5254
weighted avg 0.93 0.94 0.93 5254
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed: 0.0s finished
# predictors
features = telchurn_churn_filtered.drop('churn_probability', axis=1).columns
# feature_importance
importance = rf_model.feature_importances_
# create dataframe
feature_importance = pd.DataFrame({'variables': features, 'importance_percentage': importance*100})
feature_importance = feature_importance[['variables', 'importance_percentage']]
# sort features
feature_importance = feature_importance.sort_values('importance_percentage', ascending=False).reset_index(drop=True)
print("Sum of importance=", feature_importance.importance_percentage.sum())
feature_importance
Sum of importance= 100.00000000000001
| variables | importance_percentage | |
|---|---|---|
| 0 | loc_ic_mou_8 | 5.765741 |
| 1 | total_ic_mou_8 | 5.461674 |
| 2 | loc_og_t2m_mou_8 | 3.820271 |
| 3 | loc_ic_t2m_mou_8 | 3.637759 |
| 4 | loc_og_mou_8 | 3.102838 |
| ... | ... | ... |
| 163 | std_og_t2c_mou_7 | 0.000000 |
| 164 | std_ic_t2o_mou_7 | 0.000000 |
| 165 | std_ic_t2o_mou_8 | 0.000000 |
| 166 | std_og_t2c_mou_6 | 0.000000 |
| 167 | std_ic_t2o_mou_6 | 0.000000 |
168 rows × 2 columns
top_n = 30
top_features = feature_importance.variables[0:top_n]
import seaborn as sns
plt.rcParams["figure.figsize"] =(16,10)
sns.heatmap(data=X_train[top_features].corr(), center=0.0, cmap="YlGnBu");
top_features_10 = top_features[0:9]
X_train = X_train[top_features_10]
X_test = X_test[top_features_10]
# logistic regression
steps = [('scaler', StandardScaler()),
("logistic", LogisticRegression(class_weight={0:0.1, 1:0.9}))
]
# compile pipeline
logistic = Pipeline(steps)
# hyperparameter space
params = {'logistic__C': [0.1, 0.5, 1, 2, 3, 4, 5, 10], 'logistic__penalty': ['l1', 'l2']}
# create 5 folds
folds = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 4)
# create gridsearch object
model = GridSearchCV(estimator=logistic, cv=folds, param_grid=params, scoring='roc_auc', n_jobs=-1, verbose=1)
# fit model
model.fit(X_train, y_train)
Fitting 5 folds for each of 16 candidates, totalling 80 fits
GridSearchCV(cv=StratifiedKFold(n_splits=5, random_state=4, shuffle=True),
estimator=Pipeline(steps=[('scaler', StandardScaler()),
('logistic',
LogisticRegression(class_weight={0: 0.1,
1: 0.9}))]),
n_jobs=-1,
param_grid={'logistic__C': [0.1, 0.5, 1, 2, 3, 4, 5, 10],
'logistic__penalty': ['l1', 'l2']},
scoring='roc_auc', verbose=1)
print("Best AUC: ", model.best_score_)
print("Best hyperparameters: ", model.best_params_)
Best AUC: 0.870941852021421
Best hyperparameters: {'logistic__C': 10, 'logistic__penalty': 'l2'}
# predict churn on test data
y_pred = model.predict(X_test)
# create onfusion matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
y_pred_prob = model.predict_proba(X_test)[:, 1]
print("ROC: \t", round(roc_auc_score(y_test, y_pred_prob),2))
[[3964 855] [ 100 335]] ROC: 0.85
print(classification_report(y_test, y_pred))
precision recall f1-score support
0.0 0.98 0.82 0.89 4819
1.0 0.28 0.77 0.41 435
accuracy 0.82 5254
macro avg 0.63 0.80 0.65 5254
weighted avg 0.92 0.82 0.85 5254
logistic_model = model.best_estimator_.named_steps['logistic']
# intercept
intercept_telchurn = pd.DataFrame(logistic_model.intercept_.reshape((1,1)), columns = ['intercept'])
# coefficients
coefficients = logistic_model.coef_.reshape(9, 1).tolist()
coefficients = [val for sublist in coefficients for val in sublist]
coefficients = [round(coefficient, 3) for coefficient in coefficients]
logistic_features = list(X_train.columns)
coefficients_telchurn = pd.DataFrame(logistic_model.coef_, columns=logistic_features)
# concatenate dataframes
coefficients = pd.concat([intercept_telchurn, coefficients_telchurn], axis=1)
coefficients
| intercept | loc_ic_mou_8 | total_ic_mou_8 | loc_og_t2m_mou_8 | loc_ic_t2m_mou_8 | loc_og_mou_8 | arpu_diff | last_day_rch_amt_8 | total_rech_amt_diff | total_og_mou_8 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -1.490875 | -0.680075 | -0.482281 | -0.258217 | -0.425098 | -0.084777 | -0.287468 | -0.222731 | -0.58333 | -0.01796 |
Credentials:¶===================
Please find the contributors details below:
Primary Author:
- Name: Vamsi Krishna P
- Phone no: +91 9491392912
- Email ID: vamshi.krishna.prime@gmail.com
- LinkedIn: https://www.linkedin.com/in/vamshi-krishna-prime/
- GitHub: https://github.com/vamshi-krishna-prime
- Website: https://www.vamshi-krishna.com
- Other work sample: https://self-driving-car-lane-detection.streamlit.app/
|
|